Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for the ‘Programming Languages’ Category

Dependency Injection vs. Virtual Classes


Wednesday, December 30th, 2009

Every time you create an instance of a class by using the new keyword, you tightly couple your code to that class, and you will not be able to substitute that particularly implementation with a different one (at least not without recompiling the code).

Taken from an answer on stackoverflow that motivates dependency injection. Dependency injection however is not the real solution, it is rather a workaround for missing virtual classes. In this post I will show you why.

For example in

public class MyClass {
  public string sendMessage(String contents) {
    return new MailService().sendMessage(contents);
  }
}

you cannot use a different MessageService. This might not seem limiting at first sight, but think for example of testing. Your tests might be better off using a stub service that does not send actual emails.

This can and is being solved with dependency injection, yes. But the actual problem is rather a language problem: Classes are not virtual. While methods are virtual and thus looked up at runtime, classes are not.

In the following example, we can change the mail service by subclassing and overriding a method. This works because methods are virtual and thus late bound.

public class MyClass {
  public string sendMessage(String contents) {
    return newMailService().sendMessage(contents);
  }
  public MailService newMailService() {
    return new MailService();
  }
}

What a difference a single whitespace can make!

When newMailService is called, that method name is looked up in the context of the calling method. The context of the calling method is its class. Therefore, by subclassing the containing class we can provide a different mail service.

Imagine the same would happen with class names. Whenever we call new MailService() then the class name would be looked up in the containing context. Imagine further the lookup would not stop with the class but bubble up to the containing package. Then, by “subclassing” that package we might provide a different mail service.

Subclassing a package is nonsense terminology, let’s rather say that we instantiate a package … and let’s further assume that packages and classes are the same, and that our system is built of nested classes.

public class MyApplication {
  public class MyClass {
    public string sendMessage(String contents) {
      return new "MailService"().sendMessage(contents);
    }
  }
}

I have put the class name of MailService in quotes to remind you that it is the name of a virtual class name and thus looked up in the containing context.

Imagine further that we could pass the binding of classes to a class when creating it, then we could instantiate two versions of our application: one for real, and one for testing.

Application forTesting = new Application(MailService = TestMailer);
Application forReal = new Application(MailService = InterwebMailer);

Testing and production are not the only possible contexts. Imagine for example an XML parser. Typically it creates DOM nodes, but maybe you would prefer you own implementation of nodes. Maybe even the DAO objects of your application? There we go

XMLParser custom = new XMLParser(Node = Application.DAO);
XMLDocument doc = custom.new XMLDocument("example.xml");

NB, the second line is actually valid Java syntax: this is how you create an instance of an inner class when you are not in the same file as its outer class.

— So this is how dependency injection should really work!

And it is even not that far from reality. Both Beta and Newspeak (the new language by Gilad Bracha) do have virtual classes and both do have nested classes, and in both classes what I just described is how you decouple your packages in these languages.

And, hup hup hup, another design pattern has been disguised as a missing language feature :)

Unit Tests as First-class Entity in Programming Languages


Monday, November 23rd, 2009

To free unit-tests from the slavery of objects, horray!

Languages should support unit tests as first-class entities. In a recent blog post Cedric Beust, creator of TestNG, advocated the introduction of a unittest keyword that would allow test methods to be put in the class under test rather in a separate file. He pointed to the D language doing this.

In fact, the Noop language takes first-class unit testing even further

We believe that test code looks different from production code. Tests should be very simple, with minimal conditional logic, so that they may be read as documentation. […] A test is an entity, distinct from a class or method whose purpose is to run tests. […] suite is just another test block enclosing some tests, and may be arbitrarily nested, and divided among several files.

RSpec and friends are of course a further example of first-class unit testing. Again, tests are distinct from classes or methods and are enclosed in possibly nested contexts. As in

describe Bowling do
  it "should score 0 for gutter game" do
    bowling = Bowling.new
    20.times { bowling.hit(0) }
    bowling.score.should == 0
  end
end

In the following, I present my vision of 1st-class unit-tests.

Organizing tests in methods and classes is silly. In fact, organizing unit-test in methods and classes is an artifact of Smalltalk’s developed environment. (Yes kids, SUnit predates JUnit.) In Smalltalk, the only code that can be edited are methods bodies. So let’s see what we need to run tests

  • an example
  • that runs in
  • a context

An example specifies part of the unit under test. An example should be executable code, such that we can automatically check if the unit under test implements the specification. When example code runs, it runs within a given context (more on that later). The example code can access all state of the context, which thus provides the test fixture.

Let’s nail down the distinction between “example” and “test” first.

In the same way as class and method are static templates for the runtime entities object and (method) activation. Test code is the static template for the runtime entity example. In the same way that not only classes and methods but also object and activation are the same, an example is the same as an activation of an object. It is a chunk of memory with enough space for each fixture variable and a pointer to its origin.

We can summarize this in a table to make the analogy to classes and methods more apparent.

Template Realization Entity Context
Class creating Object Instance of outer class
Method invoking Activation Activation of calling context
Test running Example Cloned example of producer

Upon execution of a test, a new example is allocated, linked to the example of its context, and then the test code executed. So the context of a test is just the example that resulted from another test execution. With “resulted from” we mean the state of the example’s memory chunk after running that other test has terminated. We refer to that other test as “producer” since it provides the fixture for the to-be-run “consumer” test.

So how are producer and consumer linked to each other?

Here we differ from classes and method. Object are (if your language-of-choice supports inner classes) linked by the lexical context of their classes. Activations are linked by the dynamic context of the executed methods. For testing however, we need way more flexible ways of setting up the testing harness. We should be able to create examples based on any possible combination of tests.

Of course, nesting of providers as given in RSpec and Noop will be one very common case. But certainly as common will be the case where a chain of test that executed multiple time with a different outer-most provider. In JUnit this is typically achieved by subclassing a test class and overriding either setup or another helper method. An example of this case is when you have an abstract test class for Collection that tests the collection interface and a test subclass for each concrete collection subclasses that just creates another instance of that subclass.

However we might think of a more complex setup, as for example a test harness that uses the same graph of tests to test the combination of n server examples cross m client examples.

So what we need as flexible way to connect tests with the examples of other tests.

Think of JExmple 2.0 as a test harness DSL.

This DSL will be provided by tool builders, what the language has to provide is first-class entities for tests, examples and test running (which takes as parameters a test and an example). Of course, the language should also allow to clone both examples and objects, such that tool builder can avoid side-effects when multiple test expand on the same example.

Type-driven design, anyone?


Thursday, November 12th, 2009

Given that unit tests are the opposite of static typing, what is the opposite of test-driven design? Chris Smith’s great article on “What to know before debating type systems compares types” opposes types and tests with regard to correctness, performance and documentation. However, we’ve learned from the Agile community that test are most valuable when you let ‘em drive the design of your system.

Test-driven design, turned 10 this year and has been adopted by many major software companies. The benefits of TDD are well researched: 15-35% more time to release, but 60-90% less post-release defects. So well then, at the opposite of the spectrum, there must be type-driven design, right? Faster time to market but less error-prone?

Wait a moment!

In fact, recent research has shown that “the type system significantly turned out to increase the development time” (PDF). Bad news for Curry–Howard land.

Note: the linked PDF refers to the wrong workshop. In fact, the paper was published at PLATEAU not at VASE as the footer claims. Craig Anslow, organizer of both workshops told me that he made a mistake distributing the LaTex templates.

So, is there such a thing as type-driven design? If it exists, what is it good for? When does type-driven design shine? When does type-driven design break?

Pro

  • It documents your system
  • It forces you to keep documentation in coherence
  • It forces you to do structured thinking.
  • It is very good as long as you tackle the same old problems.

Cons

  • If you start to do something new and creative it breaks down
  • Its really really hard to innovate with types
  • And it takes more time

Can we get the best of both worlds?

Synthesis

  • Your language should be dynamic and not restrict your design with a type system.
  • And after that, as an after-thought you go and add a type system

Pluggable Types to the rescue! Gilad Bracha (alas, he is not on Twitter) proposed optional type systems as a way to add a type system after your design has been stabilized. Optional types systems are pluggable, you can add and remove the type checks as fits your needs. This help to keep the type system out of your way while you explore the design of your system, and allows you to plug-in a custom type system once you’re done and know your system. Bracha mentions (among others) aliasing, ownership, and information flow. The choice of the checking is up to the developer.

“The real value of type systems isn’t the early error detection” — Gilad Bracha

Prototypes of a pluggable type system are available for Smalltalk and for Java. The Smalltalk prototype of pluggable types has been developped by Niklaus Haldiman, a former Master’s student of our research lab that is now with Google.

JSR 308, the Java prototype for pluggable types is being developed by Mahmood Ali and other at MIT’s CSAIL group. JSR 308 is built on top of Java’s annotation processor and likely to be included in Java 7. Mahmood demoed JSR 308 to me at ECOOP OOPLSA 2009 in Nashville, I was able to write a type system that checks several design pattern within a couple of hours only. You best check it our yourself. The JSR 308 folks are very responsive when you need help, just give it a try.

Alas, Newspeak does not yet feature pluggable types…

Another approach is Software Hardening, that is the gradual addition of more and more detailed type information as the design of your software is “hardening”, ie stabilizes. As you see, again the same concept as with Gilad’s pluggable types. A prototype of software hardening has been realized in the Thorn language, built by a team around Jan Vitek at Purdue University together with IBM research. Please refer to the Thorn language website for more information.

The STOP workshop at this year’s ECOOP features even more funky type systems, such as Philip Wadler’s blame calculus where they (as far as I understood, greek letters are not my field of expertise) keep track of type transition to be able to blame the right line of code when a dynamically typed program fails. They prove that one can shorten the chain of type transition to so called “threesomes”, which brings their approach into the reach of acceptable performance.

…but such suggestive concepts shall not be further discussed here, this is a serious blog after all :)

Cyclomatic Complexity in Languages with Closures


Tuesday, November 10th, 2009

Cyclomatic complexity is the number of paths through a program. In programming languages with fixed syntax, this corresponds to the number of IF and FOR statements plus one. If your language of choice features closures, however, things get messier.

Kent Beck twittered this problem today and as I referred him to my implementation in Moose, a Smalltalk analysis tool, I’ll quickly discuss here how Moose deals with it. Since Smalltalk and Ruby are similar, what I discuss might also help to analyze Ruby code.

Note: cyclomatic complexity was first introduced by Tom McCabe in 1976 (the classic paper includes quite some nice graph visualization) and is thus sometime also referred to as McCabe measure or McCabe complexity.

Smalltalk and Ruby both support the definition of user defined control flow constructs through closures. In Smalltalk, even the most basic control IF construct uses closures. However, things are not as simple as just counting the number of all blocks plus one. Closures are either executed for sure (+0), not executed now (+0), maybe executed (+1), or executed xor another (+1 for both taken together).

Let’s consider a bunch of examples:

expression ifTrue: [ ... ]

the block passed to #ifTrue: is only maybe executed, which introduces another code path and thus increases the complexity by one.

expression ifTrue: [ ... ] ifFalse: [ ... ]

either the first block or the second block is executed, which introduces two mutually excluding code paths and thus increases the complexity by one even though there are two blocks.

collection do: [ :each | ... ]

the block passed to #do: is executed for sure except if the collection is empty, which introduces an exceptional code path and thus increases the complexity by one.

[ ... ] on: Exception do: [ ... ]

the first block is executed for sure while the second is only executed if the first block raises an exception, which introduces an exceptional code path and thus increases the complexity by one even though there are two blocks.

collection detect: [ ... ] ifNone: [ ... ]

combines looping with a conditional and thus increases the complexity by two, since it has a total of three paths: the empty case executes the second block, the successful detection case executes the first block, while the failed detection case executes both blocks.

SortedCollection sortBlock: [ ... ]

the block is not executed now and does thus not impact the complexity, there is only one path through above code.

html head: [
  html title: 'Example' ].
html body: [
  html heading: 'For Example'.
  html paragraph: [
    html text: 'Lorem ipsum dolor...' ]].

here blocks are used as part of Seaside’s HTML rendering DSL and are all executed for sure. Thus this example, even though looking the most complex so far, does not increase the complexity – there is only one path through above code!

Furthermore, if you are using selectors as blocks you might even get multiple path without a block, as for example

collection do: #selector

which has a complexity of two but does not contain a block at all!

If counting blocks does not do the job, what then?

In the Moose analysis, we ended up using a predefined list of selectors that increase cyclomatic complexity. Computing the complexity is then just a matter of counting the occurrences of listed selectors in the method body plus one. Since users can add their own control flow constructs such a list of selectors is never complete. There are two options to keep the list up-to-date: manual or automated. Moose currently offers neither option.

An automated detection of complexity-contributing selectors could work as follows: For each new method, if that method passes one of its formal parameters as an argument to a selector that is already in the list, add the selector of the new method to the list. It seems likely that we need further refinement for selectors with more one argument. If there is more than receiver argument, we should keep track of their complexity contribution such that we can tell whether the formal parameters are passed to contribution or non-contribution (or even xor-contributing) receiver arguments. In practice however, manually updating the list (as for example, adding Seaside’s #callback: method) might just be good enough for your analysis.

In Moose we predefined the following list of selectors

branches := #(ifTrue: ifFalse: ifTrue:ifFalse: ifFalse:ifTrue:
  ifNil: ifNil:ifNotNil: ifNotNil:ifNil: ifNotNil:
  at:ifAbsent: at:ifAbsentPut: detect:ifNone:
  on:do: and: or: xor:).
loops := #(whileTrue: whileTrue whileFalse whileFalse:
  timesRepeat: to:do: do:separatedBy: do: collect: select: reject: inject:into:
  detect: detect:ifNone: anySatisfy: allSatisfy:).

and then compute cyclomatic complexity as follows

1 + (selectors intersection: branches) size + (selectors intersection: loops) size

NB, please note how we count occurrences of #detect:ifNone: twice.

Well then, complexity of methods covered.

If we are to compute the complexity of classes or packages, polymorphism messes things up again. About 5% of all message sends are polymorph, ie there is more than one possible receiving method and thus more than one possible path between sender and receiver.

However, polymorphism is not limited to languages with closures but more a general problem of object-oriented languages and shall thus (and because someone needs to bring my son to bed) not be covered here.

Methods and Classes, the same?


Saturday, July 25th, 2009

We are used to think of classes and methods as distinct concepts. Whereas in fact, they are two of the same kind. A class is a static template for runtime objects, a method is a static template for runtime activation records (or stack frames, if you want).

  • p = new Person("Joe", "Random");

    The code above allocates memory for a new object, with enough space for each field of a person. Executes the body of person’s constructor with the given parameters. And eventually returns a reference to the allocated memory.

  • bool = set.contains(p);

    The code above allocates memory for a new stack frame, with enough space for each local variable of the #contains method. Executes the body of #contains with the given parameters. And eventually returns a reference to the result of the executed code.

As you see, the only major difference between class creation and method call is the kind of their return value.

If we take a deeper look, each stack frame has a pointer to its origin. The origin is given by the static context. Each method is statically enclosed in a class, thus the origin points to the receiving instance of the method call. This allows the method to access fields of the enclosing class. The same applies for objects. Take for example Java’s inner classes, they keep a pointer to an instance of their outer class. This allows inner classes to access fields of the enclosing class.

Of course, I am not the first to observe the uniformity of classes and methods. I am standing on the shoulder of giants such as Kristen Nygaard and Dave Ungar, the creators of BETA and Self respectively. Both of these languages unify classes and methods in a more general concept. BETA is class based and uses patterns is its sole abstraction. Self is prototype based and uses, unsurprisingly, prototypes as its sole abstraction.

For further reading, please refer to the HOPL papers of BETA and Self respectively.

 

PS, is the hammer shown on the BETA page actually the language’s logo? That would make it the language with the possibly most cool logo of all times…

MVC Reloaded, the Data-Context-Interaction Paradigm


Sunday, March 29th, 2009

Context is in the air. Today, I stumbled via @frankgerhardt upon an article by Trygve Reenskaug (father of the Model-View-Controller paradigm, for those not familiar with his work) and Jim Coplien about a new architecture paradigm:

“The DCI Architecture: A New Vision of Object-Oriented Programming”
by Trygve Reenskaug and Jim Coplien

The article focuses on how to model Use Cases. They advise against using classes, but rather propose to put behavior in algorithms that are expressed in terms of roles rather than classes. Their approach is broken up into data, context and interaction.

  • The data that is stored in domain objects that are rooted in domain classes;
  • The context that assigns running objects, on demand, to their roles in a scenario;
  • The interactions that describe end-user algorithms in terms of the roles.

Thus, data and interaction are brought together in a specific context. The article goes on to further talk about nested contexts, and how traits can be used to implement their paradigm. They say that “if a role is an analysis concept (from the mind of the end user), then a trait is a general design concept that represents the role.”

I am not yet sure how their use of context fits into my attempt towards a taxonomy of context. Clearly their context is compositional and dynamic. Even though they discuss whether roles are to be injected into classes at compile time or dynamically at run time, the actual instantiation of a context, ie which object are mapped to which roles, is always dynamically established. (At least as far as I understood their approach.) Given the assignment of roles to objects their context affect both sender and receiver of messages.

Even though the DCI paradigm can be used to model external context, there is (to my relieve) no magic mechanism in place that switches context behind the programmers back, as many other proposals for context-oriented programming do. I am happy see the notion of context being approached from a compositional view, clearly breaking down what context means in both terms of analysis and design, rather than referring to blurry terms such as the weather and room temperature. Or even worse, telling me that the very meaning of context still has to be defined :)

Some further, more personal, notes and remarks:

  • “Even Smalltalk, whose initial vision of objects and a dynamic run-time environment was truly visionary, fell victim to the class compromise,” provides me with a new incentive to look at prototype-based systems, such as Self, and to explore new compositional abstractions built from object fragments and aliases, as I have started doing with Cells and Protalk. For example, how would a system look like whose main abstraction are traits rather than classes?
  • “There is another key learning: that domain classes should be dumb. [...] We must separate simple, stable data models from dynamic behavioral models,” reminds me of one of my initial visions for Fame (a now abandoned project) that was to turn Fame models into a typed subspace within a Squeak image. The idea there was to have this type model in your Smalltalk image, and then check the type annotations given in the source code not at compile time but rather generate runtime assertions. Which is, by the way, what I as a 2nd semester student thought that Java’s type system would do. I recall how shocked I was to learn that type checks are performed at compile time only!
  • And finally, I wonder how the composition of a system into dumb classes and smart interaction contextes is going to match the grouping of the system’s vocabulary (ie identifier names, etc) by topics. We know that classes and topics do not align well, will DCI do a better job? As long as current structural abstractions fail to capture the user’s mental model of a system, which is what Trygve Reenskaug and Jim Coplin’s article argues, any visualization based on structure alone must fail to address the developer’s model as well. And this reinforces my conviction that the vocabulary- and thus domain-based layout of Software Cartography is the way to go in software visualization.

Check printf at compile-time


Monday, March 16th, 2009

Since the introduction of #printf in Java 1.5, I always wished to check my format strings and their arguments at compile-time.

Now, it can be done.

The following compiler-plugin checks format string and format arguments of any call to #printf and #format. Of course, checking is limited to format-strings known at compile time.

Just put printf.jar in your classpath (works with javac only!)

javac -classpath printf.jar Example.java

Since there are numerous #printf and #format methods in the Java API, any method with one of said names and with parameters of type String and Object[] is checked. (A cleaner solution would use a @Printf annotation to mark these methods. Alas, I cannot touch the Java API just like that.)

The above plugin works with javac only. Why? Annotation processing is done before type attribution and thus type information is not available to a JSR 269 plugin. However, checking the type of format arguments cannot be done without type information. Thus, what I do is to manually invoke the compiler’s type attribution phase. This is done using the internal Attr component as follows.

import com.sun.tools.javac.comp.Attr;

@SupportedAnnotationTypes("*")
public class Printf extends AbstractProcessor {

    private Attr attr;

    public synchronized void init(ProcessingEnvironment processingEnv) {
        attr = Attr.instance(((JavacProcessingEnvironment) processingEnv).getContext());
        ...
    }

    public boolean process(Set annotations, RoundEnvironment roundEnv) {
        for (Element each: roundEnv.getRootElements()) {
            if (each.getKind() == ElementKind.CLASS) {
                attr.attribClass(null, (ClassSymbol) each);
                ...
           }
        }
    }

    ...

PS, type information is discarded after each annotation processor round. It is thus safe to apply type manipulating changes to the AST from within an annotation processor, since after the plugins have been run, type information is re-attributed again from scratch.

For the full sources, please refer to /Sources/b/Printf on the SCG repository.

Have fun!

For.example is Digg proof thanks to caching by WP Super Cache!