Superpowers in Library Design
Wednesday, November 25th, 2009
When designing libraries, using superpowers can be for the good.
Superpowers are language features, so powerful that even seasoned developers are likely to used them for the evil … wreaking havoc on maintainers and co-workers alike. Just think of hooking into calls to missing methods, modifying classes at runtime, or even rewriting the call stack of a runnning program!
I have first heard the term superpowers being used in 2008, at an OOPSLA workshop by Martin McClure and Travis Griggs. In a room full of hackers and language designers, everyone would tell about their most spectacular use of obscure languages features and then get voted whether their use was for the good or for the evil—most were voted for the evil!
Some languages ship without superpowers at all. The fear that programmers are gonna use these superpowers for the evil is understandable. However, excluding superpowers from a language may fatally limit the power of library features.
Just think of object serialization: deserialized objects from a binary stream would be impossible without the superpowers to allocate objects without calling any constructor, to call private methods, and to change the value of final fields. And in fact, even Java’s ObjectInputStream wouldn’t get along without exactly these superpowers. They are all available to Sun’s engineers (and everyone aware of sun.misc.Unsafe, go figure) but not to Joe Average programmer.
In this post, I’ll show how Niko and me are using Smalltalk’s superpowers for the good of Phexample users.
The initial design of Phexample, boolean expectation matchers had a somewhat awkward syntax, which got soon nicknamed “lolcat syntax” by users
#(Lorem ipsum dolor) should be isEmpty.
the rational for this design was that we saw no other way to provide users with an error message whose text refers to #isEmpty by name! If we change the syntax to #() isEmpty should be true the name of the boolean property is out of reach (ie, not on the stack) when the matcher fails.
RSpec, by the way, does not suffer from lolcat syntax because Ruby’s boolean properties use a question mark rather than starting with a verb. Users can write %w{Lorem ipsum dolor} should be empty? without getting caught in the uncanny valley of DSL design.
The way out of the uncanny valley is to allow user to put the boolean property first, as in
#(Lorem ipsum dolor) isEmpty should be true.
and then, whenever an expectation fails, walk back in both stack and bytecode to recover the property name. There is two challenges here,
- to walk back in the stack, from where the expectation fails internally to the method’s stack frame where
should be truewas sent, - to walk back in the bytecode, from the current program counter to the call-site that is lexically before the initial
#shouldcall.
To make things more complicated, there can be an arbitrary number of domain messages between #true and #should, as for example in negated exceptions.
The first challenge, ie getting back to the stack frame of the DSL call, is solved by walking back the stack until we are outside of the current matcher instance
frame := thisContext.
[ frame := frame sender.
frame receiver == self ] whileTrue.
thisContext is a pseudovariable, like this and super, that provides access to the current stack frame. That is the stack frame of the above code snipped. Then, we use #sender to walk up the stack until we reach the latest frame outside of the current matcher instance. In the above comparison frame receiver refers to the value of the “self” of the stack frame, and self to the value of the “self” of the code snippet itself. No one said Superpowers ain’t confusing.
The second challenge, ie recovering the property name, is solved by recovering all message sends (ie method call-sites) from the bydecode of the method where should be true was sent
scanner := InstructionStream on: frame method.
sent := Stack new.
scanner scanFor: [ :bytecode |
sent push: scanner selectorToSendOrSelf.
(frame pc - 1) <= scanner pc ].
So, frame method now refers to the method where should be true was sent. In this method, we scan from the begin of the bytecode up to where we are, and push all sent messages (ie method names) on a stack. As Niko already mentioned in his “no more lolcats” post: bytecodes in Smalltalk have different sizes, thus we cannot just walk backwards from where we are.
Scanning bytecodes with #scanFor: stops when the last line of the block evaluates to true. Since the current program counter points to the bytecode just after the send-site of true, with subtract minus one when comparing the counter of where we are and the scanner’s counter.
So now we’ve got all message sends on a stack up to true. We keep dropping messages names from the stack, up to and including #should
[ sent isEmpty ifTrue: [ ^'<unknown>' ].
sent pop == #should ] whileFalse.
sent top isSymbol ifFalse: [ ^'<unknown>' ]
^sent top
which leaves us with the name of the boolean property that was sent before the should be true sequence. Of course, we guard against running out of stack elements while doing so.
Thus now, when executing
#(Lorem ipsum dolor) isEmpty shoud be true.
we get the message text
TestFailure: expected #isEmpty to be true
IM OUTTA YR LIBRARY.ST
KTHXBYE
