Method Namespaces for Smalltalk?
Packaging for Smalltalk is in the air. The topic emerged twice today, first at lunch when discussing Pharo with Adrian and Lukas, and later when Toon showed me CaesarJ. In both discussions, I argued that we need method namespaces for Smalltalk.
Class reopening is a well known, that is extending an existing classes with new methods. In Smalltalk classes are typically reopened using a package extension mechanism, whereas Python and Ruby use “monkey patching” (even though modules might be a better solution).
The problems that arise from these approaches are discussed en detail by Gilad Bracha on his blog, I will thus start media res using a simple example. Imagine that two libraries, A and B, both extend the String class with an #asURL method. Two implementations of the same method are provided. Thus the problem: which implementation is used? Or more precise, on what kind of context does the choice among these two implementations depend?
As it is now in Smalltalk, the latter implementation overrides the former, and thus (assumed that the load order is first A then B) the definition of #asURL in B wins and is always used. This is obviously broken, since all code in A and all clients that expect A’s implementation are likely to fail.
As it is suggested in Classboxes, the – take a deep breath – imports of the lexical scope in which the current thread was started decides which version is picked (based on IM conversation with Alex, this was not obvious to me from his thesis). I have never used this solution in practice, but it does not seem to address the problem at hand. First, usage of libraries is typically not separated by threads, that is, we typically want to use both A and B from the same thread. Second, the principle of least surprise is violated, as we can not tell which version is picked just by looking at the code (ie from lexical scope alone).
There is another approach, taken by C#’s extension methods. Each extension method is scoped within its namespace, that said, the #asULR method of A is picked for all calls from the lexical scope of A, and the #asURL method of B is picked for all calls from the lexical scope of B. Whenever some client code wants to call #asURL, it must explicitly import the method either from A or from B. This satisfies both the constraint that neither A nor B break each other or each other’s client code, and the principle of least surprise.
How can this by applied to Smalltalk?
- The name of all compiled methods is prepended with the lexical scope, that is the name of the enclosing package. For example, #A_asURL if the method #asURL is defined by the package A, and vice versa for B.
- The compiler prepends all message sends with the name of the lexical scope of the current method, that is the name of the enclosing package. For example, #A_asURL if #asURL is sent from package A, or #Foo_asURL if #asURL is sent from Client Foo.
- When a client package, for example Foo, imports #asURL from a provider Package, lets say A, a hidden method #Foo_asURL is created that delegates to #A_asURL.
Of course, Browser, Debugger and Inspector should be updated to reflect and properly display this naming convention. For example, a code editor should show the names without prefix, whereas the debugger probably should display the extensions. The latter could also be controlled by a boolean setting. The generated code should run fine in any image.
October 6th, 2008 at 17:12
Modular Smalltalk, Smallscript/S# and more recently Gemstone provide methods namespaces. Although very appealing, these solution do not support reentrance [*]. Essentially being non-reentrant means that the extended class does not benefit from the extension defined on it, even for self-calls. Reentrance is a property that is wished. This follows the least astonishing behavior.
I was always against relying on the method call stacks, but nobody did better to preserve reentrance so far.
I designed a new model that supports reentrance while without relying on the method call stack. It introduces a flattening, as we have in traits. I can send a draft on request alexandre @. bergel.eu (remove the unnecessary spaces and the dot).
[*] Section 6 page 13 and 14 in http://www.iam.unibe.ch/~scg/Archive/Papers/Berg05cModuleDiversity.pdf
October 6th, 2008 at 18:13
Suppose the class Object implements asString to print “object”. The subclass A from Object has asString in its interface. In scope X you don’t overwrite the asString method in A. In scope Y you do overwrite the method in A with something that returns “overwritten”. Now you have an instance n created in scope X and passed to scope Z which doesn’t import anything. Also an instance m created in scope Y and passed to scope Z.
Now if in Z you call “asString” on both objects; what do you expect?
If I follow the reasoning in this text; for both objects you expect “object”, which you address as being “least surprise”, as scope Z has never heard from this non-imported overwritten method from scope Y. Z_asString will be called which internally calls the only version it knows; namely MAINSCOPE_asString (or whatever scope it was defined in).
For me, following normal polymorphic behavior, the instances were created with a certain *version* of a class in mind (in a specific scope). Even if you pass these objects outside your scope, you still expect the behavior to be linked to version of the class that you had. Actually, in a way you want the instance to be an instance of a “new subclass of A”; but which still has the same superclass; because that one might also be extended in the current scope. You want to “copy and change” the class hierarchy without actually copying it.
This kind of behavior is especially necessary when you are working with scopes containing collection hierarchies; from where you want to jump back to your specific scope for calls such as “asString”. You don’t want to import all possible asString-containing scopes into those scopes. Nor do you want to duplicate the whole collection-hierarchy-containing scope in a new child-scope of the scope in which you changed the “asString” method; just so it can know about your changes.
At least for me, the person creating the instance knows what he is doing. The person using the instance, not necessarily. As long as the interface is what you expect, you can call whatever you want; the instance knows how to react to it. To me, this is less surprising towards the instance, than just executing whatever the scope from where a message is sent knows about. This sounds even almost procedural code with structs to me, rather than OO as we know it.