Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for the ‘Superpowers’ Category

Pharo Superpower: Respond to any Message


Tuesday, January 19th, 2010

For the fifth week in a row we’re stepping into the Pharo superpowers booth. Today we shall learn how to create objects that respond to any message. That is, objects that respond to a message without implementing a corresponding method. Again, as with sending any message, this superpower can be used for the good (if used with care) and I will thus discuss an example that I consider good use below.

When a message is sent to a Smalltalk object, the message name is looked up in the method dictionary of the object’s class and its superclasses. If a method whose name matches the message is found, that method is executed. However, if no matching method is found a special message is sent to the object, which is

#doesNotUnderstand:

By default, the implementation of #doesNotUnderstand: opens a debugger (or more precise, the pre-debugger dialog that we all know from test-driven development). However, we are free to override #doesNotUnderstand: and thus respond to any unknown message.

As a (dadaistic) example, let’s implement a Lorem ipsum object

Object subclass: #Lorem instanceVariableNames: 'expects'

with an ipsum constructor

Lorem class >> ispum
    ^ self new

and the following two methods

Lorem >> initialize
    expects := #(dolor sit amet "to be continued ad nauseam..." nil)

Lorem >> doesNotUnderstand: aMessage
    ^ aMessage selector == expects first
        ifTrue: [ expects := expects allButFirst. self ]
        ifFalse: [ super doesNotUnderstand: aMessage ]

So, if you ever doubted that virtually any English sentence is valid Smalltalk, here is your proof :)

Lorem ipsum dolor sit amet.

This executes as valid Smalltalk code, without ever having implement any #dolor , #sit or #amet method! If however, we deviate from the canonical Lorem ipsum sequence we’ll get the usual MessageNotUnderstood error.

[ Lorem ipsum dolor zork ] should signal: MessageNotUnderstood.

As a more sensible examples let’s consider a list that responds to any messages understood by all its elements.

OrderedCollection subclass: #Group.

Group >> eachRespondsTo: aSelector
    ^ self allSatisfy: [ :each | each respondsTo: aSelector ]

Group >> doesNotUnderstand: aMessage
    ^ (self eachRespondsTo: aMessage selector)
        ifTrue: [ self collect: [ :each | aMessage sendTo: each ] ]
        ifFalse: [ super doesNotUnderstand: aMessage ]

As you can see, the implementation of #doesNotUnderstand: follows the same pattern as above. We check whether we want to handle the message, and if not, we delegate to the default implementation in object (which will open a pre-debugger dialog).

Keen readers might have already noted a limitation of above approach: when you override #doesNotUnderstand: but not #respondsTo: your object will respond to a new message (through the means of #doesNotUnderstand:) but still insists that it does not respond to that message when queried with #respondsTo:.

So we’ll have to override #respondsTo: as well

Group >> respondsTo: aSelector
    ^ (super respondsTo: aSelector) or: [ self eachRespondsTo: aSelector ]

It is a sad but true fact, that over 90% of all #doesNotUnderstand: overriders that you’ll find out there do not override #respondsTo: as well—even though they should!

So now our new class is ready for a bunch of expectations (please refer to Phexample for more details on expectation matchers)

g := Group new.
g should not respondTo: #x.
[ g x ] should raise: MessageNotUnderstood.
g add: 2 @ 3.
g add: 3 @ 4.
g add: 1 @ 2.
g should respondTo: #x.
g x should beSameSequence: #(2 3 1).
g y should beSameSequence: #(3 4 2).

BTW, if you are an OSX user and looking for a language that provides this feature by default, take a look at F-Script by Philippe Mougin. F-Script also offers a plethora of awesome features beyond projection of messages, for example it allows you to manipulate the Cocoa objects of any OSX application—at runtime!

As a best practice, you should only override #doesNotUnderstand: and #respondsTo: on your own classes. Just imagine what might happen when two or more stakeholders attempt to override #doesNotUnderstand: in, for example, Collection, only one of the extensions will eventually remain and thus leave the system in an undefined state with overloaded extensions.

If you know more good (or evil, uoarharhar) uses of #doesNotUnderstand: share them in the comments.

Hackety hacking!

Pharo Superpower: Send Any Message


Tuesday, January 12th, 2010

In Pharo Smalltalk you may send any message even if it’s name is not known at compile time. Sending any message is one of the superpowers that can be used for the good, even when doing application programming, therefore I will discuss best practices in the end.

First of all, recall that “sending a message” is Smalltalk jargon for calling a method. Since sending a message is synchronous in Smalltalk, ie it blocks until the receiver returns, it is basically the same as a method call and the actual difference thus of philosophical nature only. (There is an implementation difference deep down at the language’s core, but that shall not be discussed today as it does not matter to programmers.)

BTW, this is the fourth post in the Superpower series.

There are many ways to send any message, mainly due to optional and variable arguments which are not well supported by Smalltalk syntax. The most basic form is

object perform: #symbol withArguments: anArray

Let’s consider a real example

string = 'Lorem'.
string perform: #size. "=> 5"
string perform: #at: with: 1. "=> $L"
string perform: #copyFrom:to: with: 2 with: 4. "=> 'ore'"

If the number of arguments is not known at compile time, we may use

string perform: aSymbol withArguments: anArray.
string perform: aSymbol withEnoughArguments: anArray.

the first expects that the array matches the arity (number of arguments) of the target method, the latter will just use as many arguments are required. This is most useful to send a message with optional arguments.

So when is sending any message for the good or for the evil?

Whenever possible, try to avoid using #perform: because it is less readable. When a reader of your program looks at the a #perform: it is not obvious which message is being sent at runtime. Also, messages that are sent with #perform: will not be shown when browsing all senders of a message. There is one subtle difference here: If the dynamically sent message is stored somewhere as symbols, at least that symbols will show up when looking for senders. If however, the dynamically sent message is composed using string concatenation, it wont show up at all. It might even seem as if its implementers are never used, which can be very confusing to the reader.

For all above reasons you should only use #perform: when you have good reason to do so. And if you use it, make sure that the dynamically sent messages are all stored as a symbol somewhere else. Best of all, make sure that all code involved into dynamically sending message is encapsulated by one single class.

I will provide an example from Hapax’s clustering algorithm. When you do hierarchical clustering, there are different ways to link small clusters into large clusters. The call to this linkage method is buried deep down in the internals of the clustering algorithm, so the ClusteringEngine class uses a strategy pattern to pick the right linkage method. The choice of strategy is stored as a symbol in an instance variable and then used as follows

Object subclass: #ClusteringEngine
    instanceVariableNames: 'distanceMatrix dendrogram linkage'
    classVariableNames: ''

ClusteringEngine >> linkage: aSelector
    linkage := aSelector

ClusteringEngine >> linkage
    ^ linkage

ClusteringEngine >> allLinkageSelectors
    ^ #( averageLinkage centroid completeLinkage meanLinkage singleLinkeage wardsMethod )

ClusteringEngine >> run
    (distanceMatrix size - 1) timesRepeat:
        [self findMinimum.
        self perform: linkage].

ClusteringEngine >> averageLinkage
    "implementation omitted..."

"et cetera..."

All code is encapsulated with one class such that a reader can find it all in one place when browsing the source code.

Any other use of #perform:, in particular when string concatenation of selectors is involved, is evil and should be limited to library design, if used at all.

A note regarding performance. Using #perform: is as fast as sending a message the normal way. So contrary to popular believe there is not performance penalty—at least not in Pharo Smalltalk, in other dialects that do use JIT compilation there might be severe performance penalties though).

Hackety hacking!

Pharo Superpower: Change of Class


Tuesday, January 5th, 2010

Smalltalk objects are ordered by classes hierarchies. But still, an object may change its class membership! Objects are able to move between classes and hierarchies at runtime.

In this post I shall show how to transform a rabbit into a light house!

First let us create two classes Rabbit and Lighthouse with the same format. We use Pharo’s public API to do so.

Object subclass: #Rabbit instanceVariableNames: 'age size'.
Object subclass: #Lighthouse instanceVariableNames: 'latitude longitude'.

Lets check that both classes are of the same format, but neither is a subclass of the other. (Please refer to Phexample for more information on expectation matchers.)

Rabbit format should = Lighthouse format.
Rabbit should not beKindOf: Lighthouse.
Lighthouse should not beKindOf: Rabbit.

Now, lets create a rabbit and transform it into an instance of a light house.

r := Rabbit new.
r class should beKindOf: Rabbit.
r primitiveChangeClassTo: Lighthouse new.
r class should beKindOf: Lighthouse.

And, abra cadabra, we’ve turned r from a Rabbit into a Lighthouse!

To do so we use the #primitiveChangeClassTo: method. This method expects an instance of the target class as argument. However this instance is only used to determine the target class of the receiver. (In other Smalltalk dialects, as for example Cincom Smalltalk, #changeClassTo: expects a class rather than an instance. We can only guess why Pharo and its sibling Squeak require an instance of the target class. My guess is that this required as proof that the target class is a valid class, since otherwise it would not have been possible to create an instance of itself.)

Hackety hacking!

Post scriptum: please note that it is not possible to change the class of a point to that of an associations even though both got two instance variables. This is because both are so called “compact” classes with a smaller header. We’ll cover that in another superpowers issue.

Pharo Superpower: Use Anything as Class


Monday, December 28th, 2009

In Pharo Smalltalk, not only can you create anonymous classes at runtime, you can use anything as a class. You can create objects whose class is not a class. Mind boggling, ain’t it?

So if an object’s class is not a class, what is it then? Recall that in Smalltalk all classes are objects, thus if a class is not a class it is at least an object. When I first discovered this superpower I though “this must be a bug in the virtual machine”. However, the Blue Book of Smalltak-80 confirms that this is by design. The virtual machine of Smalltalk does not require that classes should inherit from Behavior.

In this post, I shall use an instance of Interval to create a new object whose class is … well, an instance of interval rather than a proper class.

The choice of interval is not a coincidence. In fact, we may only use objects as classes that have at least three instance variables. The first instance variable must refer to the superclass (which neither must be a class, but to keep things simple we’ll use Object in our example), the second instance variable must refer to a method dictionary, and the third instance variable must encode a magic number that specified the class format.

g := Interval basicNew.
g instVarAt: 1 put: Object.
g instVarAt: 2 put: MethodDictionary new.
g instVarAt: 3 put: Object format.

Next we compile a method that implements primitive #70 into interval. Primitive #70 can be used to create new instances. So we can use primitive #70 to create an instance of g.

Interval compile: 'primitive70 <primitive: 70>'.
gg := g primitive70.

Let’s verify that gg is really an instance of g. (Please refer to Phexample for more information on expectation matchers.)

gg class should beSameAs: g.
gg class should not beKindOf: Behavior.

Now we can add methods to the dictionary in g’s second instance variable and they become available on gg. We’ll add a method #zork that returns self.

methods := g instVarAt: 2.
methods at: #zork put: CompiledMethod toReturnSelf.
gg zork should = gg.

Unfortunately we cannot write gg should respondTo: #zork since g is not a real class and thus gg cannot send #canUnderstand: to g. Also you might not be able to print or inspect gg for the same reason, depending on the version of Pharo you are using.

Hackety hacking!

Pharo Superpower: Create Anonymous Class


Monday, December 21st, 2009

One of the superpowers in Pharo Smalltalk is to create new classes at runtime. Actually, whenever you accept a class definition in the class browser that very definition is evaluated to create a new class. And since all development in Smalltalk happens eo ipso at runtime the accpeted class definition creates a class at runtime. Superpower for the masses, it can be done.

In this post, I shall cover how to create anonymous classes at runtime. As an example, we’ll create an anonymous subclass of point that extends Point with a color attribute.

To create an anonymous class, you’ll first need to create an anonymous metaclass. (Hey, nobody said superpowers ain’t confusing!) The newly created metaclass needs a superclass, a method dictionary and a magic format number. Computation of format numbers is explained later.

m := Metaclass new.
m superclass: Point class.
m methodDict: MethodDictionary new.
m setFormat: 156.

Now we can create the actual anonymous class. Classes are instances of their metaclass. For each metaclass there is only one instance, thus if you send #new twice an error is thrown. As above, the newly created class needs a superclass, a method dictionary and a magic format number.

c := m new.
c superclass: Point.
c methodDict: MethodDictionary new.
c setFormat: 136.

Now we can create an instance of the anonymous class. We’ll verify the new instance to check that it actually meets our expectations. (Please refer to Phexample for more information on expectation matchers.)

p := c x: 3 y: 4.
p asString should = '3@4'.
p class should = c.
p class class should = m.
p should beKindOf: Point.

Next we’ll create accessors for the additional instance variable of c, which shall be named color. And we’ll also override #printOn: to report the color.

c setInstVarNames: { 'color' }.
c compile: 'color
    ^color'.
c compile: 'color: aString
    color := aString'.
c compile: 'printOn: out
    super printOn: out.
    out nextPutAll: '' is ''.
    out nextPutAll: color'.

Again, we’ll verify our instance.

p color should = nil.
p color: #yellow.
p color should = #yellow.
p asString should = '3@4 is yellow'.

I promised to explain how to compute the magic numbers above. The format number encodes both the number of instance variables and the type of an object. For example, an object can be indexable or not. All this is stuffed into a 32-bit number. To compute the format number, we’ll use a method in ClassBuilder that does the bit-fiddling for us.

metaformat := ClassBuilder new
    computeFormat: #normal instSize: 0
    forSuper: Point class ccIndex: 0.
format := ClassBuilder new
    computeFormat: #normal instSize: 1
    forSuper: Point ccIndex: 0.

Important for us are the parameters instSize: and forSuper: which expect the number of instance variables and the superclass of the to be created class. Please note, the number of instance variables should not include inherited instance variables, but only the number of to be added instance variables: which is zero for our metaclass, and one for our colored point class.

Hackety hacking!

Superpowers in Library Design


Wednesday, November 25th, 2009

When designing libraries, using superpowers can be for the good.

Superpowers are language features, so powerful that even seasoned developers are likely to used them for the evil … wreaking havoc on maintainers and co-workers alike. Just think of hooking into calls to missing methods, modifying classes at runtime, or even rewriting the call stack of a runnning program!

I have first heard the term superpowers being used in 2008, at an OOPSLA workshop by Martin McClure and Travis Griggs. In a room full of hackers and language designers, everyone would tell about their most spectacular use of obscure languages features and then get voted whether their use was for the good or for the evil—most were voted for the evil!

Some languages ship without superpowers at all. The fear that programmers are gonna use these superpowers for the evil is understandable. However, excluding superpowers from a language may fatally limit the power of library features.

Just think of object serialization: deserialized objects from a binary stream would be impossible without the superpowers to allocate objects without calling any constructor, to call private methods, and to change the value of final fields. And in fact, even Java’s ObjectInputStream wouldn’t get along without exactly these superpowers. They are all available to Sun’s engineers (and everyone aware of sun.misc.Unsafe, go figure) but not to Joe Average programmer.

In this post, I’ll show how Niko and me are using Smalltalk’s superpowers for the good of Phexample users.

The initial design of Phexample, boolean expectation matchers had a somewhat awkward syntax, which got soon nicknamed “lolcat syntax” by users

#(Lorem ipsum dolor) should be isEmpty.

the rational for this design was that we saw no other way to provide users with an error message whose text refers to #isEmpty by name! If we change the syntax to #() isEmpty should be true the name of the boolean property is out of reach (ie, not on the stack) when the matcher fails.

RSpec, by the way, does not suffer from lolcat syntax because Ruby’s boolean properties use a question mark rather than starting with a verb. Users can write %w{Lorem ipsum dolor} should be empty? without getting caught in the uncanny valley of DSL design.

The way out of the uncanny valley is to allow user to put the boolean property first, as in

#(Lorem ipsum dolor) isEmpty should be true.

and then, whenever an expectation fails, walk back in both stack and bytecode to recover the property name. There is two challenges here,

  • to walk back in the stack, from where the expectation fails internally to the method’s stack frame where should be true was sent,
  • to walk back in the bytecode, from the current program counter to the call-site that is lexically before the initial #should call.

To make things more complicated, there can be an arbitrary number of domain messages between #true and #should, as for example in negated exceptions.

The first challenge, ie getting back to the stack frame of the DSL call, is solved by walking back the stack until we are outside of the current matcher instance

frame := thisContext.
[ frame := frame sender.
  frame receiver == self ] whileTrue.

thisContext is a pseudovariable, like this and super, that provides access to the current stack frame. That is the stack frame of the above code snipped. Then, we use #sender to walk up the stack until we reach the latest frame outside of the current matcher instance. In the above comparison frame receiver refers to the value of the “self” of the stack frame, and self to the value of the “self” of the code snippet itself. No one said Superpowers ain’t confusing.

The second challenge, ie recovering the property name, is solved by recovering all message sends (ie method call-sites) from the bydecode of the method where should be true was sent

scanner := InstructionStream on: frame method.
sent := Stack new.
scanner scanFor: [ :bytecode |
    sent push: scanner selectorToSendOrSelf.
    (frame pc - 1) <= scanner pc ].

So, frame method now refers to the method where should be true was sent. In this method, we scan from the begin of the bytecode up to where we are, and push all sent messages (ie method names) on a stack. As Niko already mentioned in his “no more lolcats” post: bytecodes in Smalltalk have different sizes, thus we cannot just walk backwards from where we are.

Scanning bytecodes with #scanFor: stops when the last line of the block evaluates to true. Since the current program counter points to the bytecode just after the send-site of true, with subtract minus one when comparing the counter of where we are and the scanner’s counter.

So now we’ve got all message sends on a stack up to true. We keep dropping messages names from the stack, up to and including #should

[ sent isEmpty ifTrue: [ ^'<unknown>' ].
  sent pop == #should ] whileFalse.
sent top isSymbol ifFalse: [ ^'<unknown>' ]
^sent top

which leaves us with the name of the boolean property that was sent before the should be true sequence. Of course, we guard against running out of stack elements while doing so.

Thus now, when executing

#(Lorem ipsum dolor) isEmpty shoud be true.

we get the message text

TestFailure: expected #isEmpty to be true

IM OUTTA YR LIBRARY.ST
KTHXBYE

For.example is Digg proof thanks to caching by WP Super Cache!