Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for the ‘DSL’ Category

Check printf at compile-time

Monday, March 16th, 2009

Since the introduction of #printf in Java 1.5, I always wished to check my format strings and their arguments at compile-time.

Now, it can be done.

The following compiler-plugin checks format string and format arguments of any call to #printf and #format. Of course, checking is limited to format-strings known at compile time.

Just put printf.jar in your classpath (works with javac only!)

javac -classpath printf.jar

Since there are numerous #printf and #format methods in the Java API, any method with one of said names and with parameters of type String and Object[] is checked. (A cleaner solution would use a @Printf annotation to mark these methods. Alas, I cannot touch the Java API just like that.)

The above plugin works with javac only. Why? Annotation processing is done before type attribution and thus type information is not available to a JSR 269 plugin. However, checking the type of format arguments cannot be done without type information. Thus, what I do is to manually invoke the compiler’s type attribution phase. This is done using the internal Attr component as follows.


public class Printf extends AbstractProcessor {

    private Attr attr;

    public synchronized void init(ProcessingEnvironment processingEnv) {
        attr = Attr.instance(((JavacProcessingEnvironment) processingEnv).getContext());

    public boolean process(Set annotations, RoundEnvironment roundEnv) {
        for (Element each: roundEnv.getRootElements()) {
            if (each.getKind() == ElementKind.CLASS) {
                attr.attribClass(null, (ClassSymbol) each);


PS, type information is discarded after each annotation processor round. It is thus safe to apply type manipulating changes to the AST from within an annotation processor, since after the plugins have been run, type information is re-attributed again from scratch.

For the full sources, please refer to /Sources/b/Printf on the SCG repository.

Have fun!

Foreach in 1 Billion with Index

Tuesday, December 9th, 2008

The previous post presented a DSL around Java’s foreach loop but did not fix its maybe most apparent limitation: the inability to access the index of a running loop.

The following solution is taken from Python’s enumerate function.

for (Each<String> word: withIndex(lorem)) {

This works the same way as the remaining ForEach DSL. We hook a custom Iterable into the loop and iterate over a series of wrappers rather than the original elements. Each wrapper has two fields, one containing the original element and one containing its index. Given a collection with 1 billion elements we would thus create 1 billion …

Err … creating ONE billion objects? This is Madness!

Let’s fix that.

While looping the wrappers are used sequentially only. One after the other. Why not reuse a single wrapper instance for all elements? And thus avoid creating another 999′999′999 wrappers? Clearly a difference that matters, I though. But a couple of benchmarks later I learned that creating a wrapper for each element is not significantly slower.

Apparently the JVM knows how to optimize its loops. This is a nice example how premature optimization is rendered futile given today’s virtual machine technology.

Actually, I wasn’t that surprised by these results. I already expected both approaches to be close, but still, it was impressive to see that the difference between both approaches is not statistically relevant. At least not with my primitive benchmarking capabilities. It is hard to write benchmarks that are neither too trivial nor shot themselves in the foot. Given all the optimizations going on under Java’s hood, I wont claim that mine do neither.


Updated sources of the ForEach DSL are available on SCG subversion.

Pimp my Foreach Loop

Thursday, December 4th, 2008

Java’s foreach loop is useful but limited. Neither can we access the iterator of a running loop, nor can we write Ruby-style1 foreach loops, such as

1) In fact, this style of looping can be traced back to the ancient times of legendary Smalltalk.

names = { |each| each.frist.upcase? }     

bool = words.any? { |each| each.size > 7 }

length = words.inject(0) { |sum,each| sum + each.size }

We all know, this is because Java is – even though in its 6th release – still without the most basic structure of a programming language: Clo-clo-closures! (And rumours are, they wont fix that for the 7th release either.)

There are workarounds. Typically, anonymous inner-classes are (ab)used as closures. I dont like that. All the syntactic clutter and implementation overhead of using full-blown classes, only to inject some simple behavior into a loop? IMHO, not worth it.

Instead, here is a small DSL that pimps ye old Java foreach loop for you:

for (Select<String> each: select(words)) {
  each.yield = each.value.charAt(0).isUppercase();
Collection<String> names = $result();
for (AnySatisfy<String> each: anySatisfy(words)) {
  each.yield = each.value.length() > 7;
boolean bool = $result();
for (Inject<Integer,String> each: inject(0,words)) {
  each.yield += each.value.length();
int length = $result();

How does it work?

Behind the scenes, Java’s foreach loop operates on an instance of Iterable. Thus we can hook a custom object into the loop and get called back upon starting and terminating the loop, as well as before and after each iteration.

In the first example, #select is a static method that wraps the given collection in a custom object. The custom object is of type Iterable<Select>, where Select has two fields. One field is used a input parameter, the other as output parameter. Before each iteration of the loop, value is populated with the current element. After each iteration, yield is polled for a boolean value. Inbetween, the loop is executed.

While running the loop, all elements for which the loop yields true are copied into a new collection. Upon terminating the loop, this collection is assigned to $result. To keep things thread-safe, the result is stored in a ThreadLocal variable .

The same technique is used in the two other examples. #anySatisfy checks if all iterations of the loop yield true. #inject combines all elements by injecting an accumulator value into each iteration of the loop.

The list of currently supported queries includes

  • AllSatisfy
  • AnySatisfy
  • Cardinality
  • Collect
  • Count
  • CutPieces
  • Detect
  • Fold
  • GroupedBy
  • IndexOf
  • Inject
  • Reject
  • Select

If you need more, you can subclass For<Each,This<Each>> and implement your own query. As an example, I shall leave the implementation of Count here:

public class Count<Each> extends For<Each,Count<Each>> {
  public Each value;
  public boolean yield;
  private int count;
  protected void afterEach() {
    if (yield) count++;
  protected Object afterLoop() {
    return count;
  protected void beforeLoop() {
    count = 0;
  protected void beforeEach(Each element) {
    value = element;
    yield = false;

As usual, the complete sources are available on SCG subversion.

Have fun!

Roman Numerals, in your Java

Wednesday, November 5th, 2008

Today’s hack is based on Lukas’s recent work on DSLs. Imagine you could compile and run a Java program that uses Roman numerals.

public class Example {
    public static void main(String[] args) {
            MCMLXXVII + XXIV

It can be done—put this 7.5 kb jar in your compiler’s classpath!

% javac -cp roman.jar
Note: 2 roman numerals processed.
% java Example

Obviously, this jar extends the syntax of Java … close, but not quite. The source code that contains the Roman numerals is already valid Java syntax. It is valid Java with undeclared variables! If you compile it without our small jar, the compiler will tell you that the Roman numerals are undeclared variable names. Thus, our small jar hooks into the compiler, bending JSR 269 beyond its limits, and replaces all roman variables names with integer literals.

Caveat, requires Java 6.

Below the rewrite rule: class Transform extends TreeTranslator, an internal class of the Java compiler. Transform visits all statements in the source code, and replaces each variable whose name matches a Roman numeral with an int literal of the same numeric value.

public class Transform extends TreeTranslator {
    public void visitIdent(JCIdent tree) {
        String name = tree.getName().toString();
        if (isRoman(name)) {
            result = make.Literal(numberize(name));
            result.pos = tree.pos;
        } else {

Our small jar implements an internal DSL, but an unusual one. We reuse the host language’s original syntax in a creative way that allows us to express new semantics. Lukas refers to this kind of language extension as “pidgin”, since human pidgin languages play in a similar way with natural language.

For the technically inclined, the complete source code is available on SCG subversion.


You may also like

ZIP Code Map of Switzerland

Wednesday, October 15th, 2008

Inspired by Robert Kosara’s US ZIP Scribble Map as well as Stefan Zeiger’s ZIP Code Map of Germany, I’ve done a small script that renders Switzerland as a ZIP code map.

Edit: obviously, Robert Kosara has also done a ZIP scribble map of Switzerland, but it is broken. On the US original he explores the correlations of 1000-blocks of ZIP codes and state boundaries. On his Swiss map however, he directly groups the lines by cantons! Which is obviously not the same since we have 26 cantons but only nine blocks of 1000 ZIP codes.

In particular interesting is how the red and brown ZIP code blocks are dividing the canton Valais by its two languages: French and German! (For foreigners: the valais is the huge valley in the south-east which is covered half by red lines and half by brown lines.)

The script is written in Pimon, a small visualization framework for VW Smalltalk. Of course, Pimon has a DSL! The layout- and drawing DSL of Pimon offers a fluent interface that targets the end-users of common Desktop applications. Figures are grouped, aligned and positioned as you would do it in an Desktop application, rather than forcing the programmer to wrap his head around vector algebra with all its x’s and y’s.

The following bunch of Pimon code specifies a tree layout (the same in Java):

TreeLevel >> arrange
    | group |
    group := Group new
        addAll: self treeChildren;
        horizontalWithGap: 4;
    Group new
        add: self treeParent;
        add: group;
        verticalWithGap: 16;
    self resizeToElements.

I started porting Pimon to Java last year, but only got 80% done and thus never released it to the public. However, it is sad to let code decay without being used and maybe the parts that are working turn out to be useful for your projects, thus I published it today:

The 6th Rule of Variable Naming

Tuesday, July 29th, 2008

Personally, I take particular care when naming methods. An often neglected detail is that an object’s public method names are almost ever written following the variable name holding an instance of the object. Thus, it is rather pointless to start all methods of a Factory class with the verb create, rather we can omit it and stick to the convention that its instances must be named create!

Factory create = new Factory();
Node n = create.Node();
Edge e = create.Edge();

In lack of a better name, I am calling this the Read-Aloud naming convention.

I have first stumbled upon this convention when reading the sources of the Java Compiler. Other occurrences known to me are restricted to the naming of static methods. For example Yossi Gil’s awesome Default class which returns a default value if a given variable is null.

String guess =, "A");

Another example of Read-Aloud naming is the headline of this blog, which refers to the creation of examples in the JExample testing framework.

Stack stack = For.example(StackTest.class, "withManyValues");

PS—if you are looking for the first five rules of variable naming, please refer to Ian Hickman’s recently digged 5 Rules of Variable Naming.

The Rabbit Will Die in Java

Monday, July 14th, 2008

Since some days, an army of mad rabbits has been invading my brain. DWEMTHY’S ARRAY, GIEV US DWEMTHY’S ARRAY, MOAR, MOAR SAUCE, MOAR!!!!!1!!1, they cry.

Finally, I have given in and did Dwemthy’s Array … in Java.

“In this game, you are a rabbit who is going to die. A dragon is going to do it. Deep in Dwemthy’s Array.” – why the lucky stiff

new Creature("Dragon") {{
    life = 1340;     // tough scales
    strength = 451;  // bristling veins
    charisma = 1020; // toothy smile
    weapon = 939;    // fire breath

More on this code in a moment.

For those unfamiliar with Dwemthy’s Array, it is a Ruby meta-programming example that is along the lines of a text-based game. The game has been put on Ward’s wiki as a challenge to Smalltalk advocates: “Go ahead and reimplement DwemthysArray!” Darren Hobbes and, recently, Nicholas Chen have taken the challenge. But never has nobody been so insane to met the challenge in the vast quagmires of Java meta-programming—not until, NOW.

*   *   *

You can not enter the Array unprepared, let us face the ScubaArgentine first:

new Creature("ScubaArgentine") {{
    life = 46;
    strength = 35;
    charisma = 91;
    weapon = 2;

Creature r = new Rabbit();
Creature s = new_("ScubaArgentine");

Now use the little boomerang!

r.send( '^', s )

>> [You hit with 2 points of damage!]
>> [Your enemy hit with 28 points of damage!]
>> [Rabbit has died.]

How does it work? We use double brace initialization to define an anonymous subclasses with pre-initialized fields. And, a custom binding mechanism is used to create instances of that class.

Even though most Java programmers are familiar with double brace initialization, I will explain it for the n00bs. The idiom combines two features of Java: anonymous inner classes and instance initializers (think anonymous constructors). Thus, the first brace creates a new anonymous inner class, the second declares an instance initializer block that is run when the anonymous inner class is instantiated.

Being anonymous, the inner class is not bound to any identifier. Therefore we can not simply use new to create new instances. Instead, we must setup a custom binding mechanism. The name “ScubaArgentine” is passed to Creature’s constructor, which binds name and anonymous class in a static lookup table. The same table is consulted by a static new_ method to create custom instances.

This is implemented in Creature as follows:

private static final Map<String, Class<? extends Creature>> INDEX = new TreeMap();
protected String name;

public Creature(String name) {
    INDEX.put(name, this.getClass()); = name;

public static Creature new_(String name) { try {
    return (Creature) INDEX.get(name).getDeclaredConstructors()[0].newInstance(name);
     catch(Exception e) { throw new RuntimeException(e); }}

Once you’re done with the example guy, it’s time to enter The Array.

You have six foes.

new Creature("IndustrialRaverMonkey") {{
    life = 46;
    strength = 35;
    charisma = 91;
    weapon = 2;

new Creature("DwarvenAngel") {{
    life = 540;
    strength = 6;
    charisma = 144;
    weapon = 50;

new Creature("AssistantViceTentacleAndOmbudsman") {{
    life = 320;
    strength = 6;
    charisma = 144;
    weapon = 50;

new Creature("TeethDeer") {{
    life = 655;
    strength = 192;
    charisma = 19;
    weapon = 109;

new Creature("IntrepidDecomposedCyclist") {{
    life = 901;
    strength = 560;
    charisma = 422;
    weapon = 105;

new Creature("Dragon") {{
    life = 1340;     // tough scales
    strength = 451;  // bristling veins
    charisma = 1020; // toothy smile
    weapon = 939;    // fire breath

Creature dwary = new DwemthysArray(new_("IndustrialRaverMonkey"),

In Ruby, DwemthysArray is a subclass of Array that delegates unknown methods to its first creature. Whenever the first creature dies, it is removed from the array, and thus, a new enemy emerges.

Alas, the implementation of DwemthysArray in Java is not as elegant. Why? Java has neither method_missing hook, nor does it support duck typing. And even worse, Java arrays are a built-in type which can not be subclasses. Thus, to keep the demons of static type checking at bay, we extend Creature and include the array as private array field only.

In order to let a new creature emerge whenever the current is found to be dead, we must intercept any boolean check that compares the life field against zero. As we can not override a field access, the Java implementation uses alive() wherever live <= 0 is used in Ruby.

Edit: Clinton Begin has refactored my code to be even MOAR META!!!. He implements DwemthysArray using a dynamic proxy, which is much closer to the original Ruby code.

Here, I shall unmask the Array Creature for you:

public class DwemthysArray extends Creature {

    private Creature[] array;

    public DwemthysArray(Creature... array) {
        this.array = array;

    public boolean alive() {
        if (life > 0) return true;
        if (array.length == 0) {
            puts( "[Whoa.  You decimated Dwemthy's Array!]" );
            return false;
        } else {
            life = array[0].life;
            strength = array[0].strength;
            charisma = array[0].charisma;
            weapon = array[0].weapon;
            name = array[0].name;
            array = Arrays.copyOfRange(array, 1, array.length);
            puts( "[Get ready. %s has emerged.]", );
            return true;


“Fight the Array and the monsters will appear as you go. Godspeed and may you return with harrowing tales and nary an angel talon piercing through your shoulder.”

Start here:

while (dwary.alive() && r.alive()) {
    puts( r );
    r.send( gets().charAt(0), dwary );

The remainder of the implementation is (except for some additional syntactic noise) almost a genuine port of the Ruby code. Reflection is used to dispatch Rabbit’s overloaded operators to methods and to mimic the output of Ruby’s inspect method. And there are a couple of static convenience methods, like puts and gets.

The full source code is available online.

For.example is Digg proof thanks to caching by WP Super Cache!