Cyclomatic Complexity in Languages with Closures
Cyclomatic complexity is the number of paths through a program. In programming languages with fixed syntax, this corresponds to the number of IF and FOR statements plus one. If your language of choice features closures, however, things get messier.
Kent Beck twittered this problem today and as I referred him to my implementation in Moose, a Smalltalk analysis tool, I’ll quickly discuss here how Moose deals with it. Since Smalltalk and Ruby are similar, what I discuss might also help to analyze Ruby code.
Note: cyclomatic complexity was first introduced by Tom McCabe in 1976 (the classic paper includes quite some nice graph visualization) and is thus sometime also referred to as McCabe measure or McCabe complexity.
Smalltalk and Ruby both support the definition of user defined control flow constructs through closures. In Smalltalk, even the most basic control IF construct uses closures. However, things are not as simple as just counting the number of all blocks plus one. Closures are either executed for sure (+0), not executed now (+0), maybe executed (+1), or executed xor another (+1 for both taken together).
Let’s consider a bunch of examples:
expression ifTrue: [ ... ]
the block passed to #ifTrue: is only maybe executed, which introduces another code path and thus increases the complexity by one.
expression ifTrue: [ ... ] ifFalse: [ ... ]
either the first block or the second block is executed, which introduces two mutually excluding code paths and thus increases the complexity by one even though there are two blocks.
collection do: [ :each | ... ]
the block passed to #do: is executed for sure except if the collection is empty, which introduces an exceptional code path and thus increases the complexity by one.
[ ... ] on: Exception do: [ ... ]
the first block is executed for sure while the second is only executed if the first block raises an exception, which introduces an exceptional code path and thus increases the complexity by one even though there are two blocks.
collection detect: [ ... ] ifNone: [ ... ]
combines looping with a conditional and thus increases the complexity by two, since it has a total of three paths: the empty case executes the second block, the successful detection case executes the first block, while the failed detection case executes both blocks.
SortedCollection sortBlock: [ ... ]
the block is not executed now and does thus not impact the complexity, there is only one path through above code.
html head: [
html title: 'Example' ].
html body: [
html heading: 'For Example'.
html paragraph: [
html text: 'Lorem ipsum dolor...' ]].
here blocks are used as part of Seaside’s HTML rendering DSL and are all executed for sure. Thus this example, even though looking the most complex so far, does not increase the complexity – there is only one path through above code!
Furthermore, if you are using selectors as blocks you might even get multiple path without a block, as for example
collection do: #selector
which has a complexity of two but does not contain a block at all!
If counting blocks does not do the job, what then?
In the Moose analysis, we ended up using a predefined list of selectors that increase cyclomatic complexity. Computing the complexity is then just a matter of counting the occurrences of listed selectors in the method body plus one. Since users can add their own control flow constructs such a list of selectors is never complete. There are two options to keep the list up-to-date: manual or automated. Moose currently offers neither option.
An automated detection of complexity-contributing selectors could work as follows: For each new method, if that method passes one of its formal parameters as an argument to a selector that is already in the list, add the selector of the new method to the list. It seems likely that we need further refinement for selectors with more one argument. If there is more than receiver argument, we should keep track of their complexity contribution such that we can tell whether the formal parameters are passed to contribution or non-contribution (or even xor-contributing) receiver arguments. In practice however, manually updating the list (as for example, adding Seaside’s #callback: method) might just be good enough for your analysis.
In Moose we predefined the following list of selectors
branches := #(ifTrue: ifFalse: ifTrue:ifFalse: ifFalse:ifTrue:
ifNil: ifNil:ifNotNil: ifNotNil:ifNil: ifNotNil:
at:ifAbsent: at:ifAbsentPut: detect:ifNone:
on:do: and: or: xor:).
loops := #(whileTrue: whileTrue whileFalse whileFalse:
timesRepeat: to:do: do:separatedBy: do: collect: select: reject: inject:into:
detect: detect:ifNone: anySatisfy: allSatisfy:).
and then compute cyclomatic complexity as follows
1 + (selectors intersection: branches) size + (selectors intersection: loops) size
NB, please note how we count occurrences of #detect:ifNone: twice.
Well then, complexity of methods covered.
If we are to compute the complexity of classes or packages, polymorphism messes things up again. About 5% of all message sends are polymorph, ie there is more than one possible receiving method and thus more than one possible path between sender and receiver.
However, polymorphism is not limited to languages with closures but more a general problem of object-oriented languages and shall thus (and because someone needs to bring my son to bed) not be covered here.