Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for the ‘JExample’ Category

Unit Tests as First-class Entity in Programming Languages


Monday, November 23rd, 2009

To free unit-tests from the slavery of objects, horray!

Languages should support unit tests as first-class entities. In a recent blog post Cedric Beust, creator of TestNG, advocated the introduction of a unittest keyword that would allow test methods to be put in the class under test rather in a separate file. He pointed to the D language doing this.

In fact, the Noop language takes first-class unit testing even further

We believe that test code looks different from production code. Tests should be very simple, with minimal conditional logic, so that they may be read as documentation. […] A test is an entity, distinct from a class or method whose purpose is to run tests. […] suite is just another test block enclosing some tests, and may be arbitrarily nested, and divided among several files.

RSpec and friends are of course a further example of first-class unit testing. Again, tests are distinct from classes or methods and are enclosed in possibly nested contexts. As in

describe Bowling do
  it "should score 0 for gutter game" do
    bowling = Bowling.new
    20.times { bowling.hit(0) }
    bowling.score.should == 0
  end
end

In the following, I present my vision of 1st-class unit-tests.

Organizing tests in methods and classes is silly. In fact, organizing unit-test in methods and classes is an artifact of Smalltalk’s developed environment. (Yes kids, SUnit predates JUnit.) In Smalltalk, the only code that can be edited are methods bodies. So let’s see what we need to run tests

  • an example
  • that runs in
  • a context

An example specifies part of the unit under test. An example should be executable code, such that we can automatically check if the unit under test implements the specification. When example code runs, it runs within a given context (more on that later). The example code can access all state of the context, which thus provides the test fixture.

Let’s nail down the distinction between “example” and “test” first.

In the same way as class and method are static templates for the runtime entities object and (method) activation. Test code is the static template for the runtime entity example. In the same way that not only classes and methods but also object and activation are the same, an example is the same as an activation of an object. It is a chunk of memory with enough space for each fixture variable and a pointer to its origin.

We can summarize this in a table to make the analogy to classes and methods more apparent.

Template Realization Entity Context
Class creating Object Instance of outer class
Method invoking Activation Activation of calling context
Test running Example Cloned example of producer

Upon execution of a test, a new example is allocated, linked to the example of its context, and then the test code executed. So the context of a test is just the example that resulted from another test execution. With “resulted from” we mean the state of the example’s memory chunk after running that other test has terminated. We refer to that other test as “producer” since it provides the fixture for the to-be-run “consumer” test.

So how are producer and consumer linked to each other?

Here we differ from classes and method. Object are (if your language-of-choice supports inner classes) linked by the lexical context of their classes. Activations are linked by the dynamic context of the executed methods. For testing however, we need way more flexible ways of setting up the testing harness. We should be able to create examples based on any possible combination of tests.

Of course, nesting of providers as given in RSpec and Noop will be one very common case. But certainly as common will be the case where a chain of test that executed multiple time with a different outer-most provider. In JUnit this is typically achieved by subclassing a test class and overriding either setup or another helper method. An example of this case is when you have an abstract test class for Collection that tests the collection interface and a test subclass for each concrete collection subclasses that just creates another instance of that subclass.

However we might think of a more complex setup, as for example a test harness that uses the same graph of tests to test the combination of n server examples cross m client examples.

So what we need as flexible way to connect tests with the examples of other tests.

Think of JExmple 2.0 as a test harness DSL.

This DSL will be provided by tool builders, what the language has to provide is first-class entities for tests, examples and test running (which takes as parameters a test and an example). Of course, the language should also allow to clone both examples and objects, such that tool builder can avoid side-effects when multiple test expand on the same example.

Shoulda use this in Pharo


Wednesday, November 11th, 2009

Phexample is the new black in unit testing for Smalltalk. It extends SUnit with two features: test dependencies, and RSpec-like expectation matchers. Test dependencies are covered on Niko’s blog, who’s developing Phexample with me, as well as on the JExample website. I shall thus focus on expectation matchers here.

Expectation matchers let you set expectations on your object. Expectations are also useful if you just use plain old SUnit test cases. They throw the same test failure as SUnit’s assertion, but are more readable in the source code as well as throw more readable failures messages.

Let’s start with an example.

Get yourself the latest Pharo image and

Gofer it
    squeaksource: 'phexample';
    addPackage: 'Phexample;
    load

then open a workspace and evaluate

stack := Stack new.
stack size should = 0.

this creates a new stack and asserts that its size should be zero. For the sake of the example, let’s evaluate a failing expectation

stack push: 42.
stack top should = 4711.

which raises TestFailure: expected 4711 but got 42 (using =).

There are expectation matchers for all basic comparisons: greater than, less than, et cetera. If you need to negate a comparison just write

stack top should not = 4711.

There are special matchers to set expectations on boolean return values.

stack isEmpty should be true.

alternatively you can write

stack should be isEmpty.

which often reads like LOLCAT SPEAK, but works quite nice with selectors that do not include a verb such as

42 should be even.

Note: matching of boolean expectations is an open issue. We would love to allow you to write stack should be empty, which is not only be more readable but would also allow us to provide better failure messages since we know that you wanted to test isEmpty. However, we fear that breaking Pharo’s senders-of feature as well as rename and other refactorings might not be worth the added value in reabability.

We welcome your feedback on this issue. For example, Oscar suggested to use

stack isEmpty wouldyaknow

however, we are not sure how serious this suggestion is to be taken :)

Back on topic.

If you expect some code to raise an error, just write

stack := Stack new.
[ stack pop ] should signal: Error

and to check the error message

[ stack pop ] should signal: Error withMessageText: 'this stack is empty'.

or even

[ stack pop ] should signal: Error withMessageText: [ :m |
    m should beKindOf: String.
    m isEmpty should not be true.
    m should endWith: 'is empty' ]

which leads us to more matchers, such as

stack should beKindOf: Collection.

which sets an expectation on the type of an object.

Note: it seems sensible to add more expectations that match the dynamic type of objects, such as duck typing and responds-to. We plan on doing this, please let us know if you have a specific use case.

string should startWith: prefix.
string should endWith: prefix.
string should matchRegex: regexString.

are some expectations that you can set on strings.

Certainly there are more common expectations on basic types such as strings and collections. Again, please let us know if you have a specific use case. One of the things we want to do with Phexample is to be driven by user needs rather than planning upfront which features you might need (and nevertheless always guessing wrong…)

A last one, suggest by Lukas. If you expect some code to run within a given duration, just write

[ ... ] should runWithin: 20 milliSeconds.

which aborts with a failure if the given code takes longer than 20 milliseconds to run.

You can find the full list of expectation matchers in the expecting protocols of the PhexMatcher class. All matchers are well covered with tests, thus for plenty examples of their use just refer to the ForExampleMatcher class (which, of course, sublcasses the Phexample class, thus all its test methods start with should..).

PS: current versions of Omnibrowser do not display a test icon for Phexample test methods. This bug has been reported and a fix provided and will thus soon be fixed in your Pharo.

Sneak Peak: Unit Test Dependencies


Monday, September 7th, 2009

Here is a preview of a recently submitted paper on unit test dependencies. Besides the main contribution, a case-study on Lea’s automatic API migration and improved defect localization with JExample, a survey of over 2,500 open source projects is presented. We used the database of the Sourcerer code search engine (by Sushil Bajracharya and Joel Ossher) to analyse how unit testing frameworks are used and extended. As an appetizer: every second project has no test suite, every fourth test suite uses mock objects, and every tenth test suite uses third-party extensions.

jexample-sneak-peak

The preview was created with Wordle, an idea that I owe to Tom Zimmermann.

To Clone or not to Clone


Wednesday, August 26th, 2009

If two or more JExample tests depend on the same return value, something must be done to prevent side-effects. The default policy is to clone the cached return value before injection. More policies are available through the @Injection annotation.

Let’s consider an example with one producer and two consumers:

@RunWith(JExample.class)
@Injection(InjectionPolicy.NONE) // disables cloning, don't do this at home!
public class StackTest {

    @Test
    public Stack emptyStack() {
        return new Stack();
    }

    @Given("#emptyStack")
    public void shouldPushFoo(Stack stack) {
        stack.push("foo");
        assertEquals(1, stack.size());
    }

    @Given("#emptyStack")
    public void shouldPushBar(Stack stack) {
        stack.push("bar");
        assertEquals(1, stack.size());
    }

}

If we run this test class either of the consumers #shouldPushFoo or #shouldPushBar (depending on the order of execution on your machine) will fail with “expected:<1> but was <2>” as error message.

Why?

The test framework runs the producer #emptyStack once and caches its return value. When the first consumer is about to run, it is called with the cached return value as parameter. The consumer then modifies the provided value in order to execute its test. However, if no special measures are taken, this modification will be visible to the second consumer, which is later called with the same cached return value!

JExample offers three strategies to avoid tainted injection values:

  • InjectionPolicy.CLONE uses the #clone method to clone all provided return values. If cloning fails for a value (typically because the value does not implement Cloneable), its provider is rerun to obtain an untainted value.

  • InjectionPolicy.DEEPCOPY uses internal reflection to create a field-by-field copy of all provided injection values and all objects reachable through these values (ie deep copy). This strategy does not call the #clone method and thus even copies values that don’t implement Cloneable. Use with caution: this strategy might cause the VM to die without further notice, since some objects are just not ment to be copied in that radical way.

  • InjectionPolicy.RERUN reruns all providers whenever injection values are required. This strategy is most stable, however you lose the benefit of faster test execution.

By default, JExample uses the CLONE policy.

Lookup of injection policies is resolved as follows: first JExample looks at the consumer method, then at the current class and all its superclasses, then at the current package and the package of all superclasses, and eventually it defaults to InjectionPolicy.DEFAULT (which, by default, defaults to InjectionPolicy.CLONE).

Injection policies are available since JExample revision 374, released in August 2009. You can obtain the latest version either as plain JAR file or as Eclipse library (update site).

5 Minute Guide to JExample


Tuesday, April 29th, 2008

JExample is an extensions of JUnit that improves defect localizaton. JExample introduces first-class dependencies. If test B depends on A, the return value of A can be used as B’s fixture. And if test A fails, then B and all other dependents of A are skipped and marked as white. In an upcoming XP 2008 paper, we show that JExample improves performance, and defect localization compared to plain JUnit tests.

In this tutorial, we are going to write a simple JExample test. We write a test for Java’s Stack class. Testing a stack is not trivial, there is an intrinsic dependency between pushing and popping elements. An element can not be popped without pushing it first. Intrinsic dependencies are evil. If there is a bug in push, both push’s and pop’s test fail, even if there is no bug in pop! In a large project, it may happen that a single bug causes dozens if not hundreds of tests to fail. Usually, only one of these tests covers the actual failure, whereas all others are false positives.

Download either the JAR file or use the Eclipse update-site to install the library plug-in.

We start with a simple test case, annotated with @RunWith. This annotations is provided by JUnit as an extension point for plugins. To run the test case with JExample, we pass JExample.class as the annotation’s value.

  import ch.unibe.jexample.JExample;
  import org.junit.runner.RunWith;

  @RunWith(JExample.class)
  public class StackTest { 

  }

Next we create our first test method: testEmpty creates an empty stack and asserts that its size is indeed zero. As JExample extends JUnit, the test method is annotated with @Test. But unlike with JUnit, the test yields its unit under test as return value! When executing the method, JExample caches that value for later use.

  import static org.junit.Assert.*;
  import java.util.Stack;
  import org.junit.Test;

  @Test
  public Stack<Integer> testEmpty() {
      Stack<Integer> stack = new Stack<Integer>();
      assertEquals(0, stack.size());
      return stack;
  }

Now, we are going to write a consumer test that uses the above return value as its fixture: testPush takes an empty stack as input parameter, adds an element and, again, returns the under under test. The dependency is declared using JExample’s @Given annotation. To inject the result of testEmpty, we pass "testEmpty" as the annotation’s value. When executing the method, JExample will pass the cached result of testEmptyStack as input parameter to testPush. This is called dependency injection.

  import ch.unibe.jexample.Given;

  @Test
  @Given("#testEmpty")
  public Stack<Integer> testPush(Stack<Integer> stack) {
      stack.push(42);
      assertFalse(stack.isEmpty());
      return stack;
  }

The same is done for testPop, which depends on testPush.

  @Test
  @Given("#testPush(java.util.Stack)")
  public void testPop(Stack<Integer> stack) {
      int element = stack.pop();
      assertEquals(42, element);
      assertTrue(stack.isEmpty());
  }

Now, let’s assume there is a bug in Stack.push(). The framework will run testEmpty, inject its return value into testPush, run it—which fails!—and thus skip testPop. Given these results we can quickly locate the bug.

More information is available on the JExampe wiki pages

For.example is Digg proof thanks to caching by WP Super Cache!