Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for the ‘Ruby’ Category

How do I track time? II


Wednesday, October 22nd, 2008

Recently, I showed how to use the command line and a ruby script to keep track of time. Due to popular demand, here is the full script (including time cloud generation).

track_my_time.rb

The script is used as follows:

% I am back from lunch
0d 0:46'32"
% I read mail
0d 0:12'05"
% I fixed bug 20446 in Some.app for Acme
0d 1:38'12"
% …

On monday, you can use --cloud to show the time cloud of last week. A time cloud is a tag cloud with all terms that appear in last week’s entries, weighted by frequency and duration of the containing entries. No stemming so far, but you can provide a stopword list in a separate file.

For example, my time cloud of last week looks as follows:

Installation instructions and more usage see script file.

How do I track time?


Thursday, October 2nd, 2008

I have a ruby script that appends a line with timestamp and all commandline arguments to a text file. Then I have an alias from “I” to this script, such that I can just type on the command line…

% I start working
0d 14:06'11"
% I have read mail
0d 00:27'45"
% I answered a question on stackoverflow.com
0d 00:11'32"
%

…and all that is recorded in a text file with timestamps.

Other scripts later use that text file to generate work reports, or just tag clouds of what I have done during the last weeks or months.

require 'date'
File.open('time.txt', 'a') do |f|
  f.puts "#{DateTime.now.strftime("%Y/%m/%d\t%H:%M:%S\t")}#{$*.join(' ')}"
end unless $*.empty?
back = DateTime.new
now = DateTime.new
File.open('time.txt') do |f|
  f.each do |line|
    unless line.strip.empty? then
      back = now
      now = DateTime.strptime(line,"%Y/%m/%d\t%H:%M:%S\t")
    end
  end
end
if $*.empty? then
  back = now
  now = DateTime.now.new_offset(of=0) + DateTime.now.offset # fix timezone!
end
diff = (now - back)
puts "#{diff.to_i}d #{(DateTime.new + (diff - diff.to_i)).strftime("%H:%M'%S\"")}"

The Rabbit Will Die in Java


Monday, July 14th, 2008

Since some days, an army of mad rabbits has been invading my brain. DWEMTHY’S ARRAY, GIEV US DWEMTHY’S ARRAY, MOAR, MOAR SAUCE, MOAR!!!!!1!!1, they cry.

Finally, I have given in and did Dwemthy’s Array … in Java.

“In this game, you are a rabbit who is going to die. A dragon is going to do it. Deep in Dwemthy’s Array.” – why the lucky stiff

new Creature("Dragon") {{
    life = 1340;     // tough scales
    strength = 451;  // bristling veins
    charisma = 1020; // toothy smile
    weapon = 939;    // fire breath
}};

More on this code in a moment.

For those unfamiliar with Dwemthy’s Array, it is a Ruby meta-programming example that is along the lines of a text-based game. The game has been put on Ward’s wiki as a challenge to Smalltalk advocates: “Go ahead and reimplement DwemthysArray!” Darren Hobbes and, recently, Nicholas Chen have taken the challenge. But never has nobody been so insane to met the challenge in the vast quagmires of Java meta-programming—not until, NOW.

*   *   *

You can not enter the Array unprepared, let us face the ScubaArgentine first:

new Creature("ScubaArgentine") {{
    life = 46;
    strength = 35;
    charisma = 91;
    weapon = 2;
}};

Creature r = new Rabbit();
Creature s = new_("ScubaArgentine");

Now use the little boomerang!

r.send( '^', s )

>> [You hit with 2 points of damage!]
>> [Your enemy hit with 28 points of damage!]
>> [Rabbit has died.]

How does it work? We use double brace initialization to define an anonymous subclasses with pre-initialized fields. And, a custom binding mechanism is used to create instances of that class.

Even though most Java programmers are familiar with double brace initialization, I will explain it for the n00bs. The idiom combines two features of Java: anonymous inner classes and instance initializers (think anonymous constructors). Thus, the first brace creates a new anonymous inner class, the second declares an instance initializer block that is run when the anonymous inner class is instantiated.

Being anonymous, the inner class is not bound to any identifier. Therefore we can not simply use new to create new instances. Instead, we must setup a custom binding mechanism. The name “ScubaArgentine” is passed to Creature’s constructor, which binds name and anonymous class in a static lookup table. The same table is consulted by a static new_ method to create custom instances.

This is implemented in Creature as follows:

private static final Map<String, Class<? extends Creature>> INDEX = new TreeMap();
protected String name;

public Creature(String name) {
    INDEX.put(name, this.getClass());
    this.name = name;
}

public static Creature new_(String name) { try {
    return (Creature) INDEX.get(name).getDeclaredConstructors()[0].newInstance(name);
}
//handle
     catch(Exception e) { throw new RuntimeException(e); }}

Once you’re done with the example guy, it’s time to enter The Array.

You have six foes.

new Creature("IndustrialRaverMonkey") {{
    life = 46;
    strength = 35;
    charisma = 91;
    weapon = 2;
}};

new Creature("DwarvenAngel") {{
    life = 540;
    strength = 6;
    charisma = 144;
    weapon = 50;
}};

new Creature("AssistantViceTentacleAndOmbudsman") {{
    life = 320;
    strength = 6;
    charisma = 144;
    weapon = 50;
}};

new Creature("TeethDeer") {{
    life = 655;
    strength = 192;
    charisma = 19;
    weapon = 109;
}};

new Creature("IntrepidDecomposedCyclist") {{
    life = 901;
    strength = 560;
    charisma = 422;
    weapon = 105;
}};

new Creature("Dragon") {{
    life = 1340;     // tough scales
    strength = 451;  // bristling veins
    charisma = 1020; // toothy smile
    weapon = 939;    // fire breath
}};

Creature dwary = new DwemthysArray(new_("IndustrialRaverMonkey"),
                                   new_("DwarvenAngel"),
                                   new_("AssistantViceTentacleAndOmbudsman"),
                                   new_("TeethDeer"),
                                   new_("IntrepidDecomposedCyclist"),
                                   new_("Dragon"));

In Ruby, DwemthysArray is a subclass of Array that delegates unknown methods to its first creature. Whenever the first creature dies, it is removed from the array, and thus, a new enemy emerges.

Alas, the implementation of DwemthysArray in Java is not as elegant. Why? Java has neither method_missing hook, nor does it support duck typing. And even worse, Java arrays are a built-in type which can not be subclasses. Thus, to keep the demons of static type checking at bay, we extend Creature and include the array as private array field only.

In order to let a new creature emerge whenever the current is found to be dead, we must intercept any boolean check that compares the life field against zero. As we can not override a field access, the Java implementation uses alive() wherever live <= 0 is used in Ruby.

Edit: Clinton Begin has refactored my code to be even MOAR META!!!. He implements DwemthysArray using a dynamic proxy, which is much closer to the original Ruby code.

Here, I shall unmask the Array Creature for you:

public class DwemthysArray extends Creature {

    private Creature[] array;

    public DwemthysArray(Creature... array) {
        super("$");
        this.array = array;
    }

    public boolean alive() {
        if (life > 0) return true;
        if (array.length == 0) {
            puts( "[Whoa.  You decimated Dwemthy's Array!]" );
            return false;
        } else {
            life = array[0].life;
            strength = array[0].strength;
            charisma = array[0].charisma;
            weapon = array[0].weapon;
            name = array[0].name;
            array = Arrays.copyOfRange(array, 1, array.length);
            puts( "[Get ready. %s has emerged.]", this.name );
            return true;
        }
    }

}

“Fight the Array and the monsters will appear as you go. Godspeed and may you return with harrowing tales and nary an angel talon piercing through your shoulder.”

Start here:

while (dwary.alive() && r.alive()) {
    puts( r );
    r.send( gets().charAt(0), dwary );
}

The remainder of the implementation is (except for some additional syntactic noise) almost a genuine port of the Ruby code. Reflection is used to dispatch Rabbit’s overloaded operators to methods and to mimic the output of Ruby’s inspect method. And there are a couple of static convenience methods, like puts and gets.

The full source code is available online.

How to write an Executable Grammar – III


Tuesday, May 13th, 2008

Today, we finish our executable grammar series. So far, we implemented an executable PEG parser, and turned it into a small DSL language. What remains to be done is taming the big O beast. As it is implemented now, the parser has exponential time complexity. We will use memoization to get the same parser run in linear time.

What does “running in exponential/linear time” mean? In computer science, we use the big O notation to describe how a program’s usage of resources is affected by the size of input. Given a program with an input of size n we use O( f(n) ) to denote the upper bound f(n) of its time or space performance.

  • O(kn) is exponential complexity — for example, solving the traveling salesman problem with dynamic programming has exponential time performance.
  • O(n2) is quadratic complexity — for example, sorting a list of n elements with bubble sort has quadratic time performance.
  • O(n log n) is linearithmic complexity — eg, sorting a list of n elements with heapsort has linearithmic (also called, en-log-en) time performance.
  • O(n) is linear complexity — eg, finding an item in an unsorted list of n elements has linear time performance.
  • O(log n) is logarithmic complexity — eg, finding an item in an sorted list with binary search has logarithmic time performance.
  • O(1) is constant complexity — eg, doing a look-up in a hash table with n entries, or, accessing an element by index in an array list.

Now, lets have a look at our current PEG implementation.

Example 1: given last time’s calculator grammar, the following expression with an input of 25 characters results in 1257 matches against the input string.

Calculator =~ "14 + 15 * 3 + 2 * (5 - 7)"

Example 2: Now, let’s consider an input of twice the size and see what happens. Given the following expression with 50 characters we observe a total of 12082 matches against the input string. That is, twice the length of input results in 10 times the number of matches.

Calculator =~ "14 + 15 * 3 + 2 * (5 - 7 + 14 + 15 * 23 / (5 - 7))"

Given an input string of length n the parser runs with O(kn) time performance, that is exponential complexity. Why? PEG parsers have backtracking through the ordered choice pattern, and unlimited look ahead through the predicate patterns. Thus it may happen that the same expressions are applied multiple times at the same position of input. The further in the input, the more times. Considering our second example, alone at position 47, the number expression is matched 730 times!

Given such terrible time complexity, executable grammers are barely usable in practice. Luckily for us, the matching functions of a PEG parser have no side-effects, and hence, we can use memoization to tame the big O beast.

PEG implementations that use memoization are known as packrat parser.

Memoization is an optimization technique to speed up computer programs. It “remembers” the result of function calls for a given input. Subsequent calls with the same input return the remembered result rather than recalculating. This is a classic trade-off, we gain speed by using more space.

As a limitation, memoization works only if the function can be replaced with its value without changing the program. That is, given the same input, the function must always return the same result. In the jargon of computer science, this is called referential transparency.

Matching a PEG expression against the input is referentially transparent. The input of the matching function is given by the tuple (expression, position) and the result is either a tuple (match, position + n) or failure. As both input string and grammar are immutable, calling the match function with the same input always results in the value. Hence, we go forward an implement

class Expression
  alias :calculate_match :match
  def match(input)
    input.memoize_match(self)
  end
end

class StringScanner
  def bucket
    @memo = [] if @memo.nil?
    bucket = @memo[self.pos]
    @memo[self.pos] = bucket = {} if bucket.nil?
    return bucket
  end
  def memoize_match(expr)
    bucket = self.bucket
    unless bucket.include? expr
      bucket[expr] = nil, self.pos // break recursion!
      match = expr.calculate_match(self)
      bucket[expr] = match, self.pos
    else
      match, self.pos = bucket[expr]
    end
    return match
  end
end

Running the two examples again, we observe 119 matches against the input of 25 characters and 218 matches against the input of 50 characters. Thus, double the inputs results in double the number of matches. Thus, our PEG parser is indeed running with linear complexity.

Well done!

However, the price we pay is linear space complexity with a rather large constant. Why? Memoization remembers all result values for all matches. Using memoization we gain speed by using more space. No black magic without sacrifice.

There are other implementations of PEG grammars that run in linear time without that trade-off. However, they are not based on executable grammers but rather employ a dedicated virtual machine! I might cover that in a future post.

How to write an Executable Grammar – II


Thursday, May 1st, 2008

Today, we continue the executable grammar series. We turn last time’s executable PEG parser into a DSL, that is, a domain specific language. Being executable, our parser is already half way there, so what is missing?

A DSL is a little language designed for a very narrow purpose. In contrast to general-purpose languages designed to handle arbitrary computational tasks, DSLs are specific to particular domains. There are two ways to create a DSL

  • As an external DSL: inventing syntax and semantics from scratch, and building an interpreter or compiler yourself.
  • As an internal DSL: embedded in an existing general-purpose language by adding or changing methods, operators, and default actions.

An advantage of the second approach is that you save time because you don’t have to write and debug a new language. Leaving you more time to focus on the problem confronting the end-user. Among the disadvantages is that the DSL will be constrained by the syntax and the capabilities of the underlying general-purpose language. Furthermore, building on top of another language often means that the full power of the base language is available to the end-user, which may be a plus or minus depending on the circumstances.

(above partly quoted from Artima: Creating DSLs with Ruby)

The grammar of a parser is one classic example of a DSL. Most parser generator frameworks require the grammar to be written in an external DSL using BNF notation. This article explores the other approach, building an internal DSL on top of Ruby.

Example 1: We will use a calculator with the following PEG grammar and semantics as motivating example for this article. The goal is that its definition in our DSL reads like a grammar rather than like general-purpose source code.

Sum  := Add | Sub | Prod
Add  := Prod & "+" & Sum
Sub  := Prod & "-" & Sum
Prod := Mult | Div | Atom
Mult := Atom & "*" & Prod
Div  := Atom & "/" & Prod
Atom := NUMBER | "(" & Sum & ")"

Add  = { |a, b| a + b }
Sub  = { |a, b| a - b }
Mult = { |a, b| a * b }
Div  = { |a, b| a / b }

Implementing a DSL in Ruby language typically involves

  • Operator overloading
  • Monkey patching
  • Overriding language hooks
  • Scoping and evaluation
  • Keyword parameters

Operator overloading has been covered last time already. In order to compose expressions into sequence respectively ordered-choice patterns, we have defined an ampersand and a pipe operation on the Expressions class.

Example 2: The motivation for operator overloading, and later monkey patching etcetera, is to improve the readability of a DSL. Just compare

number.or(Expression.from_string("(").and(sum).and(Expression.from_string("(")))

with the much more concise and elegant

NUMBER | "(" & Sum & ")"

One thing we can not yet do, is to use a simple string whenever an expression is expected. This is where monkey patching joins the game. A monkey patch extends or modifies existing code without touching the original sources. In any dynamic language, this is a quite natural thing to do. Even at runtime. Whereas with a static beast like Java this can not be done unless resorting to black magic. And even then, Java’s license prohibits specifically any change to the core classes. Which is also why AspectJ does not allow you to patch core classes. Other static languages are more flexible, consider for example C#’s extension methods.

However, monkey patching is not without controversy! Defining new methods on core classes like String does not scale up to multiple users. Imagine that our parser library is used in a large application. Other libraries, or even the application itself, might define the same method on String—but with different behavior! Welcome to yet another path that leads to the gates of madness. There are some solutions underway to resolve this conflict, but a silver bullet has yet to been found.

Even the term “monkey patching” in itself is not without controversy. Some consider it bad choice because it does not sound scientific enough. Others, dislike it for having been coined somewhere else.

Anyway, we take the path to madness and send our monkeys to work. In order to disguise strings as expressions, we add two helper methods to Expression and eventually extend the String class as follows

class Expression
  def to_expr
    self
  end
  def self.from_string(term)
    self.new Regexp.new("^#{Regexp.escape(term)}")
  end
end

class String
  def to_expr
    Expression.from_string(self)
  end
  def & expr
    self.to_expr & expr
  end
  def | expr
    self.to_expr | expr
  end
end

This set of changes allows strings to be uses as the left-hand side argument of any sequence or ordered-choice pattern. In order to achieve the same for right-hand side arguments, we must redefine SequenceExpression#& and OrderedChoiceExpression#| as well

class SequenceExpression
  def & expr
    @children << expr.to_expr; self
  end
end

class OrderedChoiceExpression
  def | expr
    @children << expr.to_expr; self
  end
end

Well done, monkeys!

A parser is not complete without semantics. No big deal here, we just define the class Parser to be an expression with associated lambda function

class Parser < Expression
  attr_accessor :expression, :semantics
  def initialize(expression=nil)
    @expression = expression
    @semantics = lambda { |args| args }
  end
  def private_match(input)
    match = @expression.match(input)
    return if match.nil?
    @semantics.call(*match)
  end
end

Example 3: Time for a review. Let’s put it all together, and see what we have achieved so far and what might still be missing. To keep things simple, we implement an addition only

number = Parser.new
number.expression = Expression.new /(-)?\d+/
number.semantics = lambda { |str| str.to_i }
add = Parser.new
add.expression = number & "+" & number
add.semantics = lambda { |a, x, b| a + b }
input = StringScanner.new "3 + 4"
puts add.match(input)

Yikes, how source codish! We would rather like to write something as concise as

add := number & "+" & number
add = { |a, b| a + b }

Magic to the rescue! Next on our list of DSL ingredients are overriding a language hook and scoping. The first and most important thing to be done now, is to hide all the parser creation clutter. We just want to spell out an expression’s name, and—tada, magic!—a new parser is to be created the first time and later the same parser to be returned.

We achieve this by overriding a language hook. A language hook is a special method that gets notified of language-related events. The most important language hooks in Ruby are

  • Kernel#method_missing
  • Module#const_missing
  • Module#method_added
  • Module#method_removed
  • Module#included
  • Module#extended

By default, the first two hooks raise an error and the remaining hooks do nothing. As parsing expressions are most like constants, we override const_missing to assign a newly created parsing expression whenever a constant is missing.

module Grammar
  def const_missing(symb)
    const_set symb, Parser.new
  end
end

Since Ruby constants must start with an uppercase letter, the parsing expressions of our DSL have the some limitation. We can live with that. Now, whenever an uninitialized constant name is used, the custom hook is called and assignes a newly created parsing expression to the so far uninitialized constant. If the same constant name is used later, it is on longer uninitialized but refers to the proper parsing expression. However, we do not want this magic to happen for all Ruby code, but only for our little grammar. Hence we use a module to limit the scope of our hook.

NB: it would be nice to limit the scope of monkey patching as well. Alas, this is not as simple. Method definitions, and thus monkey patches, are always applied global rather than being properly scoped.

Furthermore, we use module methods to define common expressions, such as number. Another module can later extend Grammar in order to import both constant resolution and predefined expressions.

module Grammar
  def number
    number = Parser.new
    number.expression = Expression.new /^\d+/
    number.semantics = lambda { |string| string.to_i }
    number
  end
end

And, we define concise aliases for the setters of a parser’s expression and semantics. Again, our purpose is to improve ease of writing and reading. For the same purpose, we define Grammar#=~ for expression matching and decide that Root shall be to the entry point of any grammar module.

class Parser
  def <= expr
    self.expression = expr
  end
  def do &lambda
    self.semantics = lambda
  end
end

module Grammar
  def =~ string
    self::Root.match StringScanner.new(input)
  end
end

Example 4: Sweetening things up with all that sugar, we get

module Add
  extend Grammar
  Root <= number & "+" & number
  Root.do { |a, plus, b| a + b }
end

puts Add =~ "3 + 4"

Done … or rather, almost done. There is one more issue left. The lambda expression for our addition still looks quite ugly with this superfluous plus parameter. A sequence pattern returns all its children as match. Therefore, the addition results in [3, "+", 4] instead of just [3, 4]. It would be nice to omit the plus.

But even worse, consider the calculator’s "(" & Sum & ")" expression, which we must associate with the rather pointless { |a,b,c| b } lambda procedure. Here, it would be nice to omit the parentheses.

A quick look at the complete grammar of the initial calculator reveals that for any sequence pattern, the string children are to be omitted when doing semantics. Hence we introduce an @omit instance variable in Expression and redefine SequenceExpressions#match to omit omitable children. Please refer to the full source code for details.

Now, our DSL is finally done.

Example 5: Finally, we can write the initial example almost as given

module Calculator
  extend Grammar

  Root    <= Sum
  Sum     <= Add | Sub | Prod
  Add     <= Prod & "+" & Sum
  Sub     <= Prod & "-" & Sum
  Prod    <= Mult | Div | Atom
  Mult    <= Atom & "*" & Prod
  Div     <= Atom & "/" & Prod
  Atom    <= number | Parens
  Parens  <= "(" & Sum & ")"

  Add.do    { |a, b| a + b }
  Sub.do    { |a, b| a - b }
  Mult.do   { |a, b| a * b }
  Div.do    { |a, b| a / b }

end

puts Calculator =~ "14 + 15 * 3 + 2 * (5 - 7)" # => 55

I hope to have shown that designing your own parser framework is not only fun but also rather simple. And, that there is more in DSLs than excessive abuse of keyword parameters. In the final installement of this series, I will show how to use memoization to tame the big O beast.

Please checkout the the full source code.

How to write an Executable Grammar


Monday, April 21st, 2008

There has been quite some noise about executable grammars recently. Executable grammars are also known as parser combinators. In today’s installment I will show you how to write your own executable grammar. You will see that it is as simple as using a parser generator like Yacc and Antlr. Actually, I’d say it is even simpler. Me myself, I never figured out how to use these parser generator beasts. They are just too arcane. Yet, I wrote my own executable grammar in less than 3 hours.

In a first step, we gonna write an executable grammar for the expression

raining & ( dogs | cats ) *

There we go.

The basic idea of an executable grammar is to build a parser’s grammar as a tree of objects that know how to parse their grammar production. How? Just open the famous Design Pattern book on page 243, and the recipe is at your fingertips: the Interpreter pattern.

Interpreter pattern.

We will step through that pattern today, implementing a scanner-free parsing expression grammar. But there is more magic around the corner. In a follow up we will expose our parser combinators as a DSL, and eventually, use the black art of memoization to turn them into a full-fledged packrat parser running in linear time.

Today’s language of choice is Ruby, so we start with:

class Expression
  def initialize(regexp)
    @regexp = regexp
  end
  def match(input)
    input.scan @regexp
  end
end

The class Expression is the basic building block of our executable grammar. Each instance of Expression knows how to parse itself, and is thus a little parser of its own. So far, expressions are nothing more than glorified regular expressions. Later, we will add subclasses to match sequence, alternation and repetition patterns.

We must require ’strscan’ as we are using StringScanner#scan to match regular expressions. StringScanner keeps a scan pointer. When scanning for a regular expression, the match must occur at the character after the scan pointer. If a regular expression matches, the scan pointer moves just beyond the last character of the match, ready to scan again from the next character onwards. In other words, StringScanner consumes the input.

require 'strscan'

Example 1: the following code will successfully match the first token of our sample input.

input = StringScanner.new "raining cats cats dogs cats"
grammar = Expression.new /raining/
p grammar.match(input)

So far our expressions are quite useless for real parsing. A real parser must not only consume its input on success, but also backtrack on failure. Not to mention such fancy magic as whitespace skipping.

In order to backtrack on failure, we are going to rewrite Expression#match. This is done by saving the input’s position before attempting to perform a match, and resetting it upon failure. Hence, we rewrite the match method as follows

  save = input.pos
  match = input.scan @regexp
  input.pos = save if match.nil?
  return match

As we are later going to redefine match in each subclass, it would be quite painstaking to repeat this boilerplate code all the time. Patterns to the rescue! There is a nice pattern to be applied here, the Template Method pattern (GOF p 325). We delegate the actual matching to a template method called private_match, which subclasses may override to redefine the actual matching.

class Expression
  def match(input)
    save = input.pos
    input.skip_whitespace
    match = self.private_match(input)
    input.pos = save if match.nil?
    return match
  end
  def private_match(input)
    input.scan @regexp
  end
end

Above, we are even getting the whitespace magic done! Before any attempt to perform private_match, leading whitespace if skipped with a call to StringScanner#skip_whitespace. As StringScanner does not offer such a method by default, we extend the class dynamically as follows:

class StringScanner
  def skip_whitespace
    self.skip /\s+/
  end
end

Next, we are going to subclasses Expression to match sequence, ordered-choice and repetition patterns. This might sound scarier than it actually is. We just follow the recipe of the Interpreter pattern (GOF p 243), adding a new subclass for each kind of pattern.

SequenceExpression matches each of its child-expressions in sequence. If any of the child-expressions fails, the whole expression fails and backtracks.

class SequenceExpression < Expression
  def initialize
    @children = []
  end
  def & expr
    @children << expr; self
  end
  def private_match(input)
    @children.collect { |expr|
      match = expr.match input
      return if match.nil?
      match
    }
  end
end

OrderedChoiceExpression will return the first successful match among its child-expressions. If none of the child-expressions succeeds, the whole expression fails.

NB: for those familiar with context free grammars, there is a gotcha here. The ordered-choice pattern of PEG grammars differs from a CFG’s alternation pattern. Given a series of alternatives, PEG will take the first match, where as CFG will walk both paths and choose the fullest match. For a CFG, E = (ab | a) and F = (a | ab) are the same. For a PEG they differ: both branches of E possibly match, whereas the second branch of F will never match since the first branch is always taken.

class OrderedChoiceExpression < Expression
  def initialize
    @children = []
  end
  def | expr
    @children << expr; self
  end
  def private_match(input)
    @children.each { |expr|
      match = expr.match input
      return match if not match.nil?
    }
    return nil
  end
end

RepetitionExpression tries to match its child-expression as many times as possible. Even if the child-expression does not match at all, the whole expression succeeds.

class RepetitionExpression < Expression
  def initialize(expression)
    @child = expression
  end
  def private_match(input)
    matches = []
    while match = @child.match(input)
      matches << match
    end
    return matches
  end
end

Eventually, we apply the Factory Method pattern (GOF p 107) to add support for these patterns to the base class.

class Expression
  def & expr
    SequenceExpression.new & self & expr
  end
  def | expr
    OrderedChoiceExpression.new | self | expr
  end
  def star
    RepetitionExpression.new self
  end
end

Example 2: the following code will successfully match the entire grammar that has initially been given as an example. First, we create expressions for the terminals “raining”, “cats”, and “dogs”. Then, we combine them using our little factory methods, thus creating a full parser.

input = StringScanner.new("raining cats cats dogs cats")
raining = Expression.new /raining/
cats = Expression.new /cats/
dogs = Expression.new /dogs/
grammar = raining & (dogs | cats).star
p grammar.match(input)

And done!

Well, not completely done. In order to implement a full parsing expression grammar we must add four additional subclasses: OneOrMoreExpression, ZeroOrOneExpression, AndPredicate, and NotPredicate. Their implementation follows the same schema as RepetitionExpression. For details, please refer to the full source code of today’s installment.

For.example is Digg proof thanks to caching by WP Super Cache!