Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for April, 2008

5 Minute Guide to JExample


Tuesday, April 29th, 2008

JExample is an extensions of JUnit that improves defect localizaton. JExample introduces first-class dependencies. If test B depends on A, the return value of A can be used as B’s fixture. And if test A fails, then B and all other dependents of A are skipped and marked as white. In an upcoming XP 2008 paper, we show that JExample improves performance, and defect localization compared to plain JUnit tests.

In this tutorial, we are going to write a simple JExample test. We write a test for Java’s Stack class. Testing a stack is not trivial, there is an intrinsic dependency between pushing and popping elements. An element can not be popped without pushing it first. Intrinsic dependencies are evil. If there is a bug in push, both push’s and pop’s test fail, even if there is no bug in pop! In a large project, it may happen that a single bug causes dozens if not hundreds of tests to fail. Usually, only one of these tests covers the actual failure, whereas all others are false positives.

Download either the JAR file or use the Eclipse update-site to install the library plug-in.

We start with a simple test case, annotated with @RunWith. This annotations is provided by JUnit as an extension point for plugins. To run the test case with JExample, we pass JExample.class as the annotation’s value.

  import ch.unibe.jexample.JExample;
  import org.junit.runner.RunWith;

  @RunWith(JExample.class)
  public class StackTest { 

  }

Next we create our first test method: testEmpty creates an empty stack and asserts that its size is indeed zero. As JExample extends JUnit, the test method is annotated with @Test. But unlike with JUnit, the test yields its unit under test as return value! When executing the method, JExample caches that value for later use.

  import static org.junit.Assert.*;
  import java.util.Stack;
  import org.junit.Test;

  @Test
  public Stack<Integer> testEmpty() {
      Stack<Integer> stack = new Stack<Integer>();
      assertEquals(0, stack.size());
      return stack;
  }

Now, we are going to write a consumer test that uses the above return value as its fixture: testPush takes an empty stack as input parameter, adds an element and, again, returns the under under test. The dependency is declared using JExample’s @Given annotation. To inject the result of testEmpty, we pass "testEmpty" as the annotation’s value. When executing the method, JExample will pass the cached result of testEmptyStack as input parameter to testPush. This is called dependency injection.

  import ch.unibe.jexample.Given;

  @Test
  @Given("#testEmpty")
  public Stack<Integer> testPush(Stack<Integer> stack) {
      stack.push(42);
      assertFalse(stack.isEmpty());
      return stack;
  }

The same is done for testPop, which depends on testPush.

  @Test
  @Given("#testPush(java.util.Stack)")
  public void testPop(Stack<Integer> stack) {
      int element = stack.pop();
      assertEquals(42, element);
      assertTrue(stack.isEmpty());
  }

Now, let’s assume there is a bug in Stack.push(). The framework will run testEmpty, inject its return value into testPush, run it—which fails!—and thus skip testPop. Given these results we can quickly locate the bug.

More information is available on the JExampe wiki pages

How to write an Executable Grammar


Monday, April 21st, 2008

There has been quite some noise about executable grammars recently. Executable grammars are also known as parser combinators. In today’s installment I will show you how to write your own executable grammar. You will see that it is as simple as using a parser generator like Yacc and Antlr. Actually, I’d say it is even simpler. Me myself, I never figured out how to use these parser generator beasts. They are just too arcane. Yet, I wrote my own executable grammar in less than 3 hours.

In a first step, we gonna write an executable grammar for the expression

raining & ( dogs | cats ) *

There we go.

The basic idea of an executable grammar is to build a parser’s grammar as a tree of objects that know how to parse their grammar production. How? Just open the famous Design Pattern book on page 243, and the recipe is at your fingertips: the Interpreter pattern.

Interpreter pattern.

We will step through that pattern today, implementing a scanner-free parsing expression grammar. But there is more magic around the corner. In a follow up we will expose our parser combinators as a DSL, and eventually, use the black art of memoization to turn them into a full-fledged packrat parser running in linear time.

Today’s language of choice is Ruby, so we start with:

class Expression
  def initialize(regexp)
    @regexp = regexp
  end
  def match(input)
    input.scan @regexp
  end
end

The class Expression is the basic building block of our executable grammar. Each instance of Expression knows how to parse itself, and is thus a little parser of its own. So far, expressions are nothing more than glorified regular expressions. Later, we will add subclasses to match sequence, alternation and repetition patterns.

We must require ’strscan’ as we are using StringScanner#scan to match regular expressions. StringScanner keeps a scan pointer. When scanning for a regular expression, the match must occur at the character after the scan pointer. If a regular expression matches, the scan pointer moves just beyond the last character of the match, ready to scan again from the next character onwards. In other words, StringScanner consumes the input.

require 'strscan'

Example 1: the following code will successfully match the first token of our sample input.

input = StringScanner.new "raining cats cats dogs cats"
grammar = Expression.new /raining/
p grammar.match(input)

So far our expressions are quite useless for real parsing. A real parser must not only consume its input on success, but also backtrack on failure. Not to mention such fancy magic as whitespace skipping.

In order to backtrack on failure, we are going to rewrite Expression#match. This is done by saving the input’s position before attempting to perform a match, and resetting it upon failure. Hence, we rewrite the match method as follows

  save = input.pos
  match = input.scan @regexp
  input.pos = save if match.nil?
  return match

As we are later going to redefine match in each subclass, it would be quite painstaking to repeat this boilerplate code all the time. Patterns to the rescue! There is a nice pattern to be applied here, the Template Method pattern (GOF p 325). We delegate the actual matching to a template method called private_match, which subclasses may override to redefine the actual matching.

class Expression
  def match(input)
    save = input.pos
    input.skip_whitespace
    match = self.private_match(input)
    input.pos = save if match.nil?
    return match
  end
  def private_match(input)
    input.scan @regexp
  end
end

Above, we are even getting the whitespace magic done! Before any attempt to perform private_match, leading whitespace if skipped with a call to StringScanner#skip_whitespace. As StringScanner does not offer such a method by default, we extend the class dynamically as follows:

class StringScanner
  def skip_whitespace
    self.skip /\s+/
  end
end

Next, we are going to subclasses Expression to match sequence, ordered-choice and repetition patterns. This might sound scarier than it actually is. We just follow the recipe of the Interpreter pattern (GOF p 243), adding a new subclass for each kind of pattern.

SequenceExpression matches each of its child-expressions in sequence. If any of the child-expressions fails, the whole expression fails and backtracks.

class SequenceExpression < Expression
  def initialize
    @children = []
  end
  def & expr
    @children << expr; self
  end
  def private_match(input)
    @children.collect { |expr|
      match = expr.match input
      return if match.nil?
      match
    }
  end
end

OrderedChoiceExpression will return the first successful match among its child-expressions. If none of the child-expressions succeeds, the whole expression fails.

NB: for those familiar with context free grammars, there is a gotcha here. The ordered-choice pattern of PEG grammars differs from a CFG’s alternation pattern. Given a series of alternatives, PEG will take the first match, where as CFG will walk both paths and choose the fullest match. For a CFG, E = (ab | a) and F = (a | ab) are the same. For a PEG they differ: both branches of E possibly match, whereas the second branch of F will never match since the first branch is always taken.

class OrderedChoiceExpression < Expression
  def initialize
    @children = []
  end
  def | expr
    @children << expr; self
  end
  def private_match(input)
    @children.each { |expr|
      match = expr.match input
      return match if not match.nil?
    }
    return nil
  end
end

RepetitionExpression tries to match its child-expression as many times as possible. Even if the child-expression does not match at all, the whole expression succeeds.

class RepetitionExpression < Expression
  def initialize(expression)
    @child = expression
  end
  def private_match(input)
    matches = []
    while match = @child.match(input)
      matches << match
    end
    return matches
  end
end

Eventually, we apply the Factory Method pattern (GOF p 107) to add support for these patterns to the base class.

class Expression
  def & expr
    SequenceExpression.new & self & expr
  end
  def | expr
    OrderedChoiceExpression.new | self | expr
  end
  def star
    RepetitionExpression.new self
  end
end

Example 2: the following code will successfully match the entire grammar that has initially been given as an example. First, we create expressions for the terminals “raining”, “cats”, and “dogs”. Then, we combine them using our little factory methods, thus creating a full parser.

input = StringScanner.new("raining cats cats dogs cats")
raining = Expression.new /raining/
cats = Expression.new /cats/
dogs = Expression.new /dogs/
grammar = raining & (dogs | cats).star
p grammar.match(input)

And done!

Well, not completely done. In order to implement a full parsing expression grammar we must add four additional subclasses: OneOrMoreExpression, ZeroOrOneExpression, AndPredicate, and NotPredicate. Their implementation follows the same schema as RepetitionExpression. For details, please refer to the full source code of today’s installment.

Hello, worlds!


Saturday, April 12th, 2008

Welcome to my blog!

Eventually up and running. Setting the whole thing up has taken more time than planned, in particular doing my own Wordpress theme.

The title of this first post follows a programmer’s tradition. When learning a programming language, it has become tradition to write, as a first step, a program that prints the character sequence “Hello, World!” on the screen.

The shortest hello world program, written in the (admittedly rather esoteric) HQ9+ language, consists of a single letter.

H

Whereas the world’s largest hello world program ever, is this 160 by 160 meter semacode below, written in a wheat-field by German artist and programmer Bernd Hopfengärtner (via we make money not art).

Hello world semacode in a wheat-field near Ilmenau, Germany.

The most amazing hello world I’ve ever seen, however, uses curve fitting. A blogger named Poromenos recently presented a function f(x) that returns the ascii ordinal for each of the letters of “Hello world!”.

from math import *

def f(x):
    return int(round(96.75 + -21.98*cos(x*1.118) + 13.29*sin(x*1.118) + -8.387*cos(2*x*1.118)\
               + 17.94*sin(2*x*1.118) + 1.265*cos(3*x*1.118) + 16.58*sin(3*x*1.118)\
               + 3.988*cos(4*x*1.118) + 8.463*sin(4*x*1.118) + 0.3583*cos(5*x*1.118)\
               + 5.878*sin(5*x*1.118)))

print "".join([chr(f(x)) for x in range(12)])

If you are still not bored, you may check out the real cost of hello world by Coding Horror’s Jeff Atwood or this, rather unusual, hello world for startup wannabes by one Jason Brownlee.

cheers,
AA

For.example is Digg proof thanks to caching by WP Super Cache!