Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for December, 2008

Foreach in 1 Billion with Index


Tuesday, December 9th, 2008

The previous post presented a DSL around Java’s foreach loop but did not fix its maybe most apparent limitation: the inability to access the index of a running loop.

The following solution is taken from Python’s enumerate function.

for (Each<String> word: withIndex(lorem)) {
    …
    word.index;
    word.element;
    …
}

This works the same way as the remaining ForEach DSL. We hook a custom Iterable into the loop and iterate over a series of wrappers rather than the original elements. Each wrapper has two fields, one containing the original element and one containing its index. Given a collection with 1 billion elements we would thus create 1 billion …

Err … creating ONE billion objects? This is Madness!

Let’s fix that.

While looping the wrappers are used sequentially only. One after the other. Why not reuse a single wrapper instance for all elements? And thus avoid creating another 999′999′999 wrappers? Clearly a difference that matters, I though. But a couple of benchmarks later I learned that creating a wrapper for each element is not significantly slower.

Apparently the JVM knows how to optimize its loops. This is a nice example how premature optimization is rendered futile given today’s virtual machine technology.

Actually, I wasn’t that surprised by these results. I already expected both approaches to be close, but still, it was impressive to see that the difference between both approaches is not statistically relevant. At least not with my primitive benchmarking capabilities. It is hard to write benchmarks that are neither too trivial nor shot themselves in the foot. Given all the optimizations going on under Java’s hood, I wont claim that mine do neither.

 

Updated sources of the ForEach DSL are available on SCG subversion.

Pimp my Foreach Loop


Thursday, December 4th, 2008

Java’s foreach loop is useful but limited. Neither can we access the iterator of a running loop, nor can we write Ruby-style1 foreach loops, such as

1) In fact, this style of looping can be traced back to the ancient times of legendary Smalltalk.

names = words.select { |each| each.frist.upcase? }     

bool = words.any? { |each| each.size > 7 }

length = words.inject(0) { |sum,each| sum + each.size }

We all know, this is because Java is – even though in its 6th release – still without the most basic structure of a programming language: Clo-clo-closures! (And rumours are, they wont fix that for the 7th release either.)

There are workarounds. Typically, anonymous inner-classes are (ab)used as closures. I dont like that. All the syntactic clutter and implementation overhead of using full-blown classes, only to inject some simple behavior into a loop? IMHO, not worth it.

Instead, here is a small DSL that pimps ye old Java foreach loop for you:

for (Select<String> each: select(words)) {
  each.yield = each.value.charAt(0).isUppercase();
}
Collection<String> names = $result();
for (AnySatisfy<String> each: anySatisfy(words)) {
  each.yield = each.value.length() > 7;
}
boolean bool = $result();
for (Inject<Integer,String> each: inject(0,words)) {
  each.yield += each.value.length();
}
int length = $result();

How does it work?

Behind the scenes, Java’s foreach loop operates on an instance of Iterable. Thus we can hook a custom object into the loop and get called back upon starting and terminating the loop, as well as before and after each iteration.

In the first example, #select is a static method that wraps the given collection in a custom object. The custom object is of type Iterable<Select>, where Select has two fields. One field is used a input parameter, the other as output parameter. Before each iteration of the loop, value is populated with the current element. After each iteration, yield is polled for a boolean value. Inbetween, the loop is executed.

While running the loop, all elements for which the loop yields true are copied into a new collection. Upon terminating the loop, this collection is assigned to $result. To keep things thread-safe, the result is stored in a ThreadLocal variable .

The same technique is used in the two other examples. #anySatisfy checks if all iterations of the loop yield true. #inject combines all elements by injecting an accumulator value into each iteration of the loop.

The list of currently supported queries includes

  • AllSatisfy
  • AnySatisfy
  • Cardinality
  • Collect
  • Count
  • CutPieces
  • Detect
  • Fold
  • GroupedBy
  • IndexOf
  • Inject
  • Reject
  • Select

If you need more, you can subclass For<Each,This<Each>> and implement your own query. As an example, I shall leave the implementation of Count here:

public class Count<Each> extends For<Each,Count<Each>> {
  public Each value;
  public boolean yield;
  private int count;
  protected void afterEach() {
    if (yield) count++;
  }
  protected Object afterLoop() {
    return count;
  }
  protected void beforeLoop() {
    count = 0;
  }
  protected void beforeEach(Each element) {
    value = element;
    yield = false;
  }
}

As usual, the complete sources are available on SCG subversion.

Have fun!

For.example is Digg proof thanks to caching by WP Super Cache!