Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

757 Days


May 8th, 2012

757 days is a long time. Way too long. Expect new content soon.

Software by Night


April 21st, 2010

In this post I’ll present an idea from unpublished work that uses light effects to visualize dynamic information in Software Cartography. (If you are unfamiliar with spatial-representation of software please refer to “A Software cartographer’s Vision” previously featured on this blog.)

“Software is not just structural, but behavioral; so the next thing I would like to see is traffic and people walking.” — Rob Deline, in the question session of Frank Steinbrückner’s presentation at MSA 2010

Software visualizations typically convey a static view of software systems. Software maps are no different. On a software map all code of a system is shown—no matter whether it is used at runtime or dead code. This is just the same as with Earth when seen from space: at day you see all land masses but only at night it is revealed where human activity takes place and where not. So my idea is to realize the same for software maps.

  • use static information to group the software artifacts into structural clusters, visualized as land masses.
  • use dynamic information to embellish the visualization with behavior activity, visualized as sources of lights.

Light effects, such as flares and glow, make the execution of software visible. What we actually do is to add more dimensions to a two-dimensional visualization, just as Michele Lanza did with his seminal polymetric views. There seem to be three promising dimensions for light effects

  • brightness of light source,
  • blurriness of surrounding glow,
  • intensity of flares that flash up,

that are possibly best used in combination to visualize one or two behavioral properties.

When modelling animated visualization the notion of change is essential. Ben Fry’s work on organic visualization is a good source of inspiration to model time-dependent values, In his work he proposed intergrators as a new numerical data-type with first-class support for continuously changing values (as eg ease-in and ease-out).

Other than static numericals, integrators continuously change their value over time. Integrator values are in the continuous rather than discrete domain, they grow and decay rather than increment and decrement.

Fry associates each visual component with a set of set of static and changing values, using the data-types Numerical and Integrator. He describes an integrator as “a continuously changing value” and defines the following operations on integrators:

  • set() explicitly set the current value of the Integrator. Normally, this is only used to set an initial value for the Integrator.
  • impulse() adds a specified amount of force to the Integrator. Equivalent to
    incrementing a Numeric value, but executed in the continuous domain,
    i.e. the amount added attenuates over time.
  • decay() the opposite of an impulse. This is a decrease in the continuous domain. Often used to atrophy values over time.
  • attract() apply a force to move the Integrator towards a particular value. Instead of setting the Integrator to a particular value, a target value is set for the Integrator that it will reach over time. Enables smooth
    transitions if the target is changing.
  • repel() the opposite of attract, moves the value of the Integrator away from a particular numeric value. If the value being avoided is greater, the
    Integrator is decreased further. If less, the Integrator is increased.
  • update() this is used internally to update the Integrator’s current value on each time step, after calculating a new velocity based on the forces
    that have been applied to the Integrator by the rules that affect it.
  • reset() after each the current value is updated, the forces are cleared. New forces are added on each time step by re-applying the rules.

To implement software by night integrators are best used in order to add light effects to the still view of a static software map. Currently we formalized two light dimensions, flare and glow. So when a software artifact is used then it flashes up with a sudden impuls to attract the users attention and then decays in a slowly disappearing glow. Calls are visualized by sparks that jump from call-site to call-site. So users can se where current action is taking place, but also where past action has been taking place and how often it has been doing so.

Please note that “Software at Night” is not yet available in the downloadable Codemap plug-in. I am working on an external prototype for an upcoming publication, thought. The missing integration in Eclipse is mainly due to the lack of an easy accessible source of dynamic information within Eclipse.

To learn about new Codemap releases, follow http://twitter.com/codemap

Happy Hacking!

Exploring the Layout of Software Maps


April 1st, 2010

In this post I will cover my current work on Software Cartography. If you are unfamiliar with spatial-representation of software please refer to “A Software cartographer’s Vision” previously featured on this blog.

Codemap is available for download as Eclipse plug-in.

A major issue of software cartography is the base layout of software maps. A good base layout is both stable over time and conveys a meaningful grouping of software artifacts into islands (ie clusters). My initial attempt was to based the layout on vocabulary found in the source code, that is identifier names and comments. Vocabulary has the nice property that it is more stable over time than structure, and that it naturally conveys a meaningful clustering of latent topics.

However a first user study has shown that a vocabulary-based layout does not meet the developers intuition. Even though developers had been well aware that the layout was based on vocabulary, they took conclusions that assumed a structure-based layout. We learnt this from the think-aloud protocol used in the user study.

The user study included six developers that each had 1.5 hours time to explore an unknown software system. They were given six tasks of increasing complexity—cumulating in fixing an actual bug.

I will cover the user study in more details in a future blog post. In this post I shall discuss some proposals for alternative layouts…

* * *

Package-based layout — is typically the first layout that comes to people’s mind when they hear about software cartography. In fact there is a rich set of previous work that uses packag-based layout. For example, Codecity uses a treemap layout to visualize software systems based in their package structure.

A treemap layout makes good use of screen space, and using packages is likely to convey a meaningful clustering (ignoring for the moment that Java’s package nesting is a mere naming convention and bears no language semantics what so ever.)

However, packaging layouts are not stable in the face of change. They are gap-less and thus major parts of the map have to be moved aside in order to make space for new elements (and vice-versa for disappearing elements). Codecity works around this limitation by offering an after-the-fact analysis mode where the layout of past snapshots anticipates the latest state of the system. However, this is only applicable to post-mortem analysis but not to tools that are embedded in a development environment with live changes.

Callgraph-based layout — is the other common layout of software visualization. Static analysis is used to find the call-relations between software artifacts, and then standard force-based graph drawing is applied visualize to the system. Force-based graph drawing adapt well to change, however this is canceled by call-relations having a very high change frequency.

Call-relations are well understood by developers and possibly quite close to developers’s intuitive understanding of distance in software systems. Personally however, I would prefer if the map were based on a more abstract distance measurement.

For example, It would be desirable if call-relations that are displayed on the map had a meaningful interpretation. A long-distance-call should have a diagnostic interpretation. Given a layout based on call-graphs however, a long-distance-call would just indicate failure of the force-based layout to find a solution where all calls have the same length.

Law-of-demeter layout — the “Law of Demeter” is a guideline in software design. It states that each method should only talk to its friends, which are defined as its class’s fields, its local variables and its method arguments. Based on this we can defined an idealized call-based distance between software artifacts.

Given a LOD-based layout, software artifacts are close to one another if they are supposed to call one another and far apart if they better oughta not call one another. Thus we get the desired property that visualizing call-graphs conveys meaningful arrow distances. And also, compared to a raw call-graph, a LOD-based graph is less connected and thus better suited for graph drawing.

Best fo all: on a LOD-based map, any long-distance-call has a diagnostic interpretation that helps developers to take actions: long calls possibly violates the “Law of Demeter”!

History-driven layout — Claus Lewerentz and Frank Steinbrückner developed a layout for software cities that is based on historical data. They start out with a central plaza and then each new packages branches off as a new street, and each new class is a building along these streets. This generates visually awesome and intuitively stable software maps.

Their work has been (first?) presented at the recent MSA 2010 workshop and will be published soon. I will cover their work in more detail when their publication appears. For the moment please refer to this slide deck of Frank Steinbrückner’s MSA presentation

Test-dependency-based layout — despite Agile claims, unit tests do depend on one another. We can record this dependencies using dynamic instrumentation and profiling, and establish a partial-order of unit tests. Based on that partial-order we can do a radial tree layout of tests and place each software artifact closest to their corresponding tests (kernel method).

Test-dependency-based layout is stable, since changes to test code are typically less frequent than changes to the code under tests. In particular, changes to the dependencies between tests.

Also, a test-dependency-based layout has a clear diagnostic interpretation. Along any radial axis, software artifact on the inside provide services to their clients on the outside. Thus the map’s layout is like a cell, with API interfaces on the outside and a basic kernel at the core.

* * *

An important criterion, that has not yet been discussed, is “ease of retrieval” (there might be proper term in data mining for that). What I mean is availability and accessibility of required data. For example, vocabulary data is always available and easy to access (you don’t even need to parse the source code) while historical data and dynamic instrumentation are often either not available or not easy accessible. So far, I’ve thus only implement vocabulary-based layout (featured in the current download of Codemap) and a prototype of law-of-demeter-based layout.

In this post I’ve only scratched the surface of possible map layouts and their design space. For example, Niko Schwarz took inspiration from Richard Dawkins and proposed code hospitality as a foundation for software maps. Code hospitality is a measure for how likely snippets of one class are to run when copied into another class.

If you’ve got your own crazy ideas, tweet or blog about it!

I’ll collect all pingbacks in a future blog post.

Pharo Superpower: Respond to any Message


January 19th, 2010

For the fifth week in a row we’re stepping into the Pharo superpowers booth. Today we shall learn how to create objects that respond to any message. That is, objects that respond to a message without implementing a corresponding method. Again, as with sending any message, this superpower can be used for the good (if used with care) and I will thus discuss an example that I consider good use below.

When a message is sent to a Smalltalk object, the message name is looked up in the method dictionary of the object’s class and its superclasses. If a method whose name matches the message is found, that method is executed. However, if no matching method is found a special message is sent to the object, which is

#doesNotUnderstand:

By default, the implementation of #doesNotUnderstand: opens a debugger (or more precise, the pre-debugger dialog that we all know from test-driven development). However, we are free to override #doesNotUnderstand: and thus respond to any unknown message.

As a (dadaistic) example, let’s implement a Lorem ipsum object

Object subclass: #Lorem instanceVariableNames: 'expects'

with an ipsum constructor

Lorem class >> ispum
    ^ self new

and the following two methods

Lorem >> initialize
    expects := #(dolor sit amet "to be continued ad nauseam..." nil)

Lorem >> doesNotUnderstand: aMessage
    ^ aMessage selector == expects first
        ifTrue: [ expects := expects allButFirst. self ]
        ifFalse: [ super doesNotUnderstand: aMessage ]

So, if you ever doubted that virtually any English sentence is valid Smalltalk, here is your proof :)

Lorem ipsum dolor sit amet.

This executes as valid Smalltalk code, without ever having implement any #dolor , #sit or #amet method! If however, we deviate from the canonical Lorem ipsum sequence we’ll get the usual MessageNotUnderstood error.

[ Lorem ipsum dolor zork ] should signal: MessageNotUnderstood.

As a more sensible examples let’s consider a list that responds to any messages understood by all its elements.

OrderedCollection subclass: #Group.

Group >> eachRespondsTo: aSelector
    ^ self allSatisfy: [ :each | each respondsTo: aSelector ]

Group >> doesNotUnderstand: aMessage
    ^ (self eachRespondsTo: aMessage selector)
        ifTrue: [ self collect: [ :each | aMessage sendTo: each ] ]
        ifFalse: [ super doesNotUnderstand: aMessage ]

As you can see, the implementation of #doesNotUnderstand: follows the same pattern as above. We check whether we want to handle the message, and if not, we delegate to the default implementation in object (which will open a pre-debugger dialog).

Keen readers might have already noted a limitation of above approach: when you override #doesNotUnderstand: but not #respondsTo: your object will respond to a new message (through the means of #doesNotUnderstand:) but still insists that it does not respond to that message when queried with #respondsTo:.

So we’ll have to override #respondsTo: as well

Group >> respondsTo: aSelector
    ^ (super respondsTo: aSelector) or: [ self eachRespondsTo: aSelector ]

It is a sad but true fact, that over 90% of all #doesNotUnderstand: overriders that you’ll find out there do not override #respondsTo: as well—even though they should!

So now our new class is ready for a bunch of expectations (please refer to Phexample for more details on expectation matchers)

g := Group new.
g should not respondTo: #x.
[ g x ] should raise: MessageNotUnderstood.
g add: 2 @ 3.
g add: 3 @ 4.
g add: 1 @ 2.
g should respondTo: #x.
g x should beSameSequence: #(2 3 1).
g y should beSameSequence: #(3 4 2).

BTW, if you are an OSX user and looking for a language that provides this feature by default, take a look at F-Script by Philippe Mougin. F-Script also offers a plethora of awesome features beyond projection of messages, for example it allows you to manipulate the Cocoa objects of any OSX application—at runtime!

As a best practice, you should only override #doesNotUnderstand: and #respondsTo: on your own classes. Just imagine what might happen when two or more stakeholders attempt to override #doesNotUnderstand: in, for example, Collection, only one of the extensions will eventually remain and thus leave the system in an undefined state with overloaded extensions.

If you know more good (or evil, uoarharhar) uses of #doesNotUnderstand: share them in the comments.

Hackety hacking!

Pharo Superpower: Send Any Message


January 12th, 2010

In Pharo Smalltalk you may send any message even if it’s name is not known at compile time. Sending any message is one of the superpowers that can be used for the good, even when doing application programming, therefore I will discuss best practices in the end.

First of all, recall that “sending a message” is Smalltalk jargon for calling a method. Since sending a message is synchronous in Smalltalk, ie it blocks until the receiver returns, it is basically the same as a method call and the actual difference thus of philosophical nature only. (There is an implementation difference deep down at the language’s core, but that shall not be discussed today as it does not matter to programmers.)

BTW, this is the fourth post in the Superpower series.

There are many ways to send any message, mainly due to optional and variable arguments which are not well supported by Smalltalk syntax. The most basic form is

object perform: #symbol withArguments: anArray

Let’s consider a real example

string = 'Lorem'.
string perform: #size. "=> 5"
string perform: #at: with: 1. "=> $L"
string perform: #copyFrom:to: with: 2 with: 4. "=> 'ore'"

If the number of arguments is not known at compile time, we may use

string perform: aSymbol withArguments: anArray.
string perform: aSymbol withEnoughArguments: anArray.

the first expects that the array matches the arity (number of arguments) of the target method, the latter will just use as many arguments are required. This is most useful to send a message with optional arguments.

So when is sending any message for the good or for the evil?

Whenever possible, try to avoid using #perform: because it is less readable. When a reader of your program looks at the a #perform: it is not obvious which message is being sent at runtime. Also, messages that are sent with #perform: will not be shown when browsing all senders of a message. There is one subtle difference here: If the dynamically sent message is stored somewhere as symbols, at least that symbols will show up when looking for senders. If however, the dynamically sent message is composed using string concatenation, it wont show up at all. It might even seem as if its implementers are never used, which can be very confusing to the reader.

For all above reasons you should only use #perform: when you have good reason to do so. And if you use it, make sure that the dynamically sent messages are all stored as a symbol somewhere else. Best of all, make sure that all code involved into dynamically sending message is encapsulated by one single class.

I will provide an example from Hapax’s clustering algorithm. When you do hierarchical clustering, there are different ways to link small clusters into large clusters. The call to this linkage method is buried deep down in the internals of the clustering algorithm, so the ClusteringEngine class uses a strategy pattern to pick the right linkage method. The choice of strategy is stored as a symbol in an instance variable and then used as follows

Object subclass: #ClusteringEngine
    instanceVariableNames: 'distanceMatrix dendrogram linkage'
    classVariableNames: ''

ClusteringEngine >> linkage: aSelector
    linkage := aSelector

ClusteringEngine >> linkage
    ^ linkage

ClusteringEngine >> allLinkageSelectors
    ^ #( averageLinkage centroid completeLinkage meanLinkage singleLinkeage wardsMethod )

ClusteringEngine >> run
    (distanceMatrix size - 1) timesRepeat:
        [self findMinimum.
        self perform: linkage].

ClusteringEngine >> averageLinkage
    "implementation omitted..."

"et cetera..."

All code is encapsulated with one class such that a reader can find it all in one place when browsing the source code.

Any other use of #perform:, in particular when string concatenation of selectors is involved, is evil and should be limited to library design, if used at all.

A note regarding performance. Using #perform: is as fast as sending a message the normal way. So contrary to popular believe there is not performance penalty—at least not in Pharo Smalltalk, in other dialects that do use JIT compilation there might be severe performance penalties though).

Hackety hacking!

Imagine, IDE search so faaaaast that…


January 9th, 2010

Imagine an IDE where search were so fast that it became the sole means of navigation.

In such a system, one would not write //TODO but just this.todo() since browsing the callers of todo is so fast that is it faster than using a dedicated task view. The system might even be hidden to your Yahoogle query for “fast search in IDE” since in such a system search is so fast and ease-to-use that its ceases to be used as a verb. For example, the devs might speak of “browse callers” rather than “search callers” since no intermediate search step is between them and their need.

I can see some of my readers smile now :) …because…

There is such a system. It is little known since it predates the invention of today’s filesystems and has never made the transition to file based software development itself. It is Smalltalk, the old lady of dynamic programming languages.

In the IDE of Smalltalk “browse callers of” and “browse implements of” are the main means of navigation. Executing these actions opens–in the same instant–a new editor window with all callers (or implementers) of a given method. No spinning wheel, no tree list of search results, no browsing of results even, you are right there and can start editing.

NB: in fact, seasoned Smalltalk devs even omit “browse” and just say “senders’of” (in message-oriented languages such as Smalltalk and Ruby, objects don’t call methods but send each other messages) and “implementers’of” instead of a proper verbs.

For the alert reader: yes, graphical UIs predate hierarchical file systems. And even better, Smalltalk invented graphical windows. But we’ll stop the children’s games here. It does not matter who was first, but who makes the best out of it. And there, the winner is obvious.

The point I want to make is rather that there is a system out there with a 30-year head start in IDE search. So as researchers we can go and learn from the experience of that community, and then used what we learned to advance the state of current IDEs beyond it. Breakpoints, for example, are also just another method call in Smalltalk. You insert a call to #halt() where ever you want, and to view the list of current break points you browse all callers of halt. Again, no need for a dedicated view. As you see, search-driven development simplifies your tool set.

Of course, not all your navigation needs can be satisfied by hyperjumping. Sometimes you need to drill down from top-levels packages to methods. To do so Eclipse offers the code browsing perspective, which is however never used because the package explorer view offers the same drill-down capabilities without change of perspective. In Smalltalk we get a code browsing interface as well. In fact, Eclipse inherited that perspective from its predecessor VisualAge which was IBM’s prime Smalltalk IDE before they switched to Java.

So before I start to tell the story of how Eclipse’s elimination of the compilation step was inherited from VisualAge as well, lemme summarize this post.

  • Search so fast that is disappears from the list of “verbs” in your IDE.
  • Search so fast that it is called “browse code” instead.
  • Search so fast that developers, for example, us method calls as TODO markers.
  • Plus a drill-down interface for the remaining navigation needs that are not covered by hyperjumping.

The comparison with compilation is actually quite nice: With Eclipse “compile” and “build” ceased to be used as verbs in Java development. Now devs just execute code, done. This feature was brought to Java from Smalltalk. It would be awesome if we could achieve the same kind of “knowledge transfer” for IDE search.

I’d say that our job as providers of IDE search is only done when search ceases to be used as a verb in software development.

— that said, paper submission for SUITE is open until January 19, 2010.

 

Sneak Peak: Codemap User Study


January 8th, 2010

Here is a preview of a recently submitted paper on Software Cartography. We report on preliminary results of an ongoing user study with both students and professional developers. Results are mixed and revised the assumption that lexical similarity is sufficient to layout the map. We are now working on a new distance metric that includes the ideal structural proximity proposed by the “Law of Demeter”. Also we are looking into new layout algorithms. For example, anchored MDS would allow developers to rearrange the map according to their system’s architecture.

NB if you are a professional developer from Switzerland, we encourage you to participate in our user study. Please contact David or me for more information.

towards-an-improved-mental-model-of-software-developers-through-cartographic-visualization-icse-nier-2010

The preview was created with Wordle, an idea that I owe to Tom Zimmermann.

For.example is Digg proof thanks to caching by WP Super Cache!