There has been quite some noise about executable grammars recently. Executable grammars are also known as parser combinators. In today’s installment I will show you how to write your own executable grammar. You will see that it is as simple as using a parser generator like Yacc and Antlr. Actually, I’d say it is even simpler. Me myself, I never figured out how to use these parser generator beasts. They are just too arcane. Yet, I wrote my own executable grammar in less than 3 hours.
In a first step, we gonna write an executable grammar for the expression
raining & ( dogs | cats ) *
There we go.
The basic idea of an executable grammar is to build a parser’s grammar as a tree of objects that know how to parse their grammar production. How? Just open the famous Design Pattern book on page 243, and the recipe is at your fingertips: the Interpreter pattern.

We will step through that pattern today, implementing a scanner-free parsing expression grammar. But there is more magic around the corner. In a follow up we will expose our parser combinators as a DSL, and eventually, use the black art of memoization to turn them into a full-fledged packrat parser running in linear time.
Today’s language of choice is Ruby, so we start with:
class Expression
def initialize(regexp)
@regexp = regexp
end
def match(input)
input.scan @regexp
end
end
The class Expression is the basic building block of our executable grammar. Each instance of Expression knows how to parse itself, and is thus a little parser of its own. So far, expressions are nothing more than glorified regular expressions. Later, we will add subclasses to match sequence, alternation and repetition patterns.
We must require ’strscan’ as we are using StringScanner#scan to match regular expressions. StringScanner keeps a scan pointer. When scanning for a regular expression, the match must occur at the character after the scan pointer. If a regular expression matches, the scan pointer moves just beyond the last character of the match, ready to scan again from the next character onwards. In other words, StringScanner consumes the input.
require 'strscan'
Example 1: the following code will successfully match the first token of our sample input.
input = StringScanner.new "raining cats cats dogs cats"
grammar = Expression.new /raining/
p grammar.match(input)
So far our expressions are quite useless for real parsing. A real parser must not only consume its input on success, but also backtrack on failure. Not to mention such fancy magic as whitespace skipping.
In order to backtrack on failure, we are going to rewrite Expression#match. This is done by saving the input’s position before attempting to perform a match, and resetting it upon failure. Hence, we rewrite the match method as follows
save = input.pos
match = input.scan @regexp
input.pos = save if match.nil?
return match
As we are later going to redefine match in each subclass, it would be quite painstaking to repeat this boilerplate code all the time. Patterns to the rescue! There is a nice pattern to be applied here, the Template Method pattern (GOF p 325). We delegate the actual matching to a template method called private_match, which subclasses may override to redefine the actual matching.
class Expression
def match(input)
save = input.pos
input.skip_whitespace
match = self.private_match(input)
input.pos = save if match.nil?
return match
end
def private_match(input)
input.scan @regexp
end
end
Above, we are even getting the whitespace magic done! Before any attempt to perform private_match, leading whitespace if skipped with a call to StringScanner#skip_whitespace. As StringScanner does not offer such a method by default, we extend the class dynamically as follows:
class StringScanner
def skip_whitespace
self.skip /\s+/
end
end
Next, we are going to subclasses Expression to match sequence, ordered-choice and repetition patterns. This might sound scarier than it actually is. We just follow the recipe of the Interpreter pattern (GOF p 243), adding a new subclass for each kind of pattern.
SequenceExpression matches each of its child-expressions in sequence. If any of the child-expressions fails, the whole expression fails and backtracks.
class SequenceExpression < Expression
def initialize
@children = []
end
def & expr
@children << expr; self
end
def private_match(input)
@children.collect { |expr|
match = expr.match input
return if match.nil?
match
}
end
end
OrderedChoiceExpression will return the first successful match among its child-expressions. If none of the child-expressions succeeds, the whole expression fails.
NB: for those familiar with context free grammars, there is a gotcha here. The ordered-choice pattern of PEG grammars differs from a CFG’s alternation pattern. Given a series of alternatives, PEG will take the first match, where as CFG will walk both paths and choose the fullest match. For a CFG, E = (ab | a) and F = (a | ab) are the same. For a PEG they differ: both branches of E possibly match, whereas the second branch of F will never match since the first branch is always taken.
class OrderedChoiceExpression < Expression
def initialize
@children = []
end
def | expr
@children << expr; self
end
def private_match(input)
@children.each { |expr|
match = expr.match input
return match if not match.nil?
}
return nil
end
end
RepetitionExpression tries to match its child-expression as many times as possible. Even if the child-expression does not match at all, the whole expression succeeds.
class RepetitionExpression < Expression
def initialize(expression)
@child = expression
end
def private_match(input)
matches = []
while match = @child.match(input)
matches << match
end
return matches
end
end
Eventually, we apply the Factory Method pattern (GOF p 107) to add support for these patterns to the base class.
class Expression
def & expr
SequenceExpression.new & self & expr
end
def | expr
OrderedChoiceExpression.new | self | expr
end
def star
RepetitionExpression.new self
end
end
Example 2: the following code will successfully match the entire grammar that has initially been given as an example. First, we create expressions for the terminals “raining”, “cats”, and “dogs”. Then, we combine them using our little factory methods, thus creating a full parser.
input = StringScanner.new("raining cats cats dogs cats")
raining = Expression.new /raining/
cats = Expression.new /cats/
dogs = Expression.new /dogs/
grammar = raining & (dogs | cats).star
p grammar.match(input)
And done!
Well, not completely done. In order to implement a full parsing expression grammar we must add four additional subclasses: OneOrMoreExpression, ZeroOrOneExpression, AndPredicate, and NotPredicate. Their implementation follows the same schema as RepetitionExpression. For details, please refer to the full source code of today’s installment.