Foreach in 1 Billion with Index
Tuesday, December 9th, 2008
The previous post presented a DSL around Java’s foreach loop but did not fix its maybe most apparent limitation: the inability to access the index of a running loop.
The following solution is taken from Python’s enumerate function.
for (Each<String> word: withIndex(lorem)) {
…
word.index;
word.element;
…
}
This works the same way as the remaining ForEach DSL. We hook a custom Iterable into the loop and iterate over a series of wrappers rather than the original elements. Each wrapper has two fields, one containing the original element and one containing its index. Given a collection with 1 billion elements we would thus create 1 billion …
Err … creating ONE billion objects? This is Madness!
Let’s fix that.
While looping the wrappers are used sequentially only. One after the other. Why not reuse a single wrapper instance for all elements? And thus avoid creating another 999′999′999 wrappers? Clearly a difference that matters, I though. But a couple of benchmarks later I learned that creating a wrapper for each element is not significantly slower.
Apparently the JVM knows how to optimize its loops. This is a nice example how premature optimization is rendered futile given today’s virtual machine technology.
Actually, I wasn’t that surprised by these results. I already expected both approaches to be close, but still, it was impressive to see that the difference between both approaches is not statistically relevant. At least not with my primitive benchmarking capabilities. It is hard to write benchmarks that are neither too trivial nor shot themselves in the foot. Given all the optimizations going on under Java’s hood, I wont claim that mine do neither.
Updated sources of the ForEach DSL are available on SCG subversion.