Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for the ‘Software Cartography’ Category

Exploring the Layout of Software Maps


Thursday, April 1st, 2010

In this post I will cover my current work on Software Cartography. If you are unfamiliar with spatial-representation of software please refer to “A Software cartographer’s Vision” previously featured on this blog.

Codemap is available for download as Eclipse plug-in.

A major issue of software cartography is the base layout of software maps. A good base layout is both stable over time and conveys a meaningful grouping of software artifacts into islands (ie clusters). My initial attempt was to based the layout on vocabulary found in the source code, that is identifier names and comments. Vocabulary has the nice property that it is more stable over time than structure, and that it naturally conveys a meaningful clustering of latent topics.

However a first user study has shown that a vocabulary-based layout does not meet the developers intuition. Even though developers had been well aware that the layout was based on vocabulary, they took conclusions that assumed a structure-based layout. We learnt this from the think-aloud protocol used in the user study.

The user study included six developers that each had 1.5 hours time to explore an unknown software system. They were given six tasks of increasing complexity—cumulating in fixing an actual bug.

I will cover the user study in more details in a future blog post. In this post I shall discuss some proposals for alternative layouts…

* * *

Package-based layout — is typically the first layout that comes to people’s mind when they hear about software cartography. In fact there is a rich set of previous work that uses packag-based layout. For example, Codecity uses a treemap layout to visualize software systems based in their package structure.

A treemap layout makes good use of screen space, and using packages is likely to convey a meaningful clustering (ignoring for the moment that Java’s package nesting is a mere naming convention and bears no language semantics what so ever.)

However, packaging layouts are not stable in the face of change. They are gap-less and thus major parts of the map have to be moved aside in order to make space for new elements (and vice-versa for disappearing elements). Codecity works around this limitation by offering an after-the-fact analysis mode where the layout of past snapshots anticipates the latest state of the system. However, this is only applicable to post-mortem analysis but not to tools that are embedded in a development environment with live changes.

Callgraph-based layout — is the other common layout of software visualization. Static analysis is used to find the call-relations between software artifacts, and then standard force-based graph drawing is applied visualize to the system. Force-based graph drawing adapt well to change, however this is canceled by call-relations having a very high change frequency.

Call-relations are well understood by developers and possibly quite close to developers’s intuitive understanding of distance in software systems. Personally however, I would prefer if the map were based on a more abstract distance measurement.

For example, It would be desirable if call-relations that are displayed on the map had a meaningful interpretation. A long-distance-call should have a diagnostic interpretation. Given a layout based on call-graphs however, a long-distance-call would just indicate failure of the force-based layout to find a solution where all calls have the same length.

Law-of-demeter layout — the “Law of Demeter” is a guideline in software design. It states that each method should only talk to its friends, which are defined as its class’s fields, its local variables and its method arguments. Based on this we can defined an idealized call-based distance between software artifacts.

Given a LOD-based layout, software artifacts are close to one another if they are supposed to call one another and far apart if they better oughta not call one another. Thus we get the desired property that visualizing call-graphs conveys meaningful arrow distances. And also, compared to a raw call-graph, a LOD-based graph is less connected and thus better suited for graph drawing.

Best fo all: on a LOD-based map, any long-distance-call has a diagnostic interpretation that helps developers to take actions: long calls possibly violates the “Law of Demeter”!

History-driven layout — Claus Lewerentz and Frank Steinbrückner developed a layout for software cities that is based on historical data. They start out with a central plaza and then each new packages branches off as a new street, and each new class is a building along these streets. This generates visually awesome and intuitively stable software maps.

Their work has been (first?) presented at the recent MSA 2010 workshop and will be published soon. I will cover their work in more detail when their publication appears. For the moment please refer to this slide deck of Frank Steinbrückner’s MSA presentation

Test-dependency-based layout — despite Agile claims, unit tests do depend on one another. We can record this dependencies using dynamic instrumentation and profiling, and establish a partial-order of unit tests. Based on that partial-order we can do a radial tree layout of tests and place each software artifact closest to their corresponding tests (kernel method).

Test-dependency-based layout is stable, since changes to test code are typically less frequent than changes to the code under tests. In particular, changes to the dependencies between tests.

Also, a test-dependency-based layout has a clear diagnostic interpretation. Along any radial axis, software artifact on the inside provide services to their clients on the outside. Thus the map’s layout is like a cell, with API interfaces on the outside and a basic kernel at the core.

* * *

An important criterion, that has not yet been discussed, is “ease of retrieval” (there might be proper term in data mining for that). What I mean is availability and accessibility of required data. For example, vocabulary data is always available and easy to access (you don’t even need to parse the source code) while historical data and dynamic instrumentation are often either not available or not easy accessible. So far, I’ve thus only implement vocabulary-based layout (featured in the current download of Codemap) and a prototype of law-of-demeter-based layout.

In this post I’ve only scratched the surface of possible map layouts and their design space. For example, Niko Schwarz took inspiration from Richard Dawkins and proposed code hospitality as a foundation for software maps. Code hospitality is a measure for how likely snippets of one class are to run when copied into another class.

If you’ve got your own crazy ideas, tweet or blog about it!

I’ll collect all pingbacks in a future blog post.

Sneak Peak: Codemap User Study


Friday, January 8th, 2010

Here is a preview of a recently submitted paper on Software Cartography. We report on preliminary results of an ongoing user study with both students and professional developers. Results are mixed and revised the assumption that lexical similarity is sufficient to layout the map. We are now working on a new distance metric that includes the ideal structural proximity proposed by the “Law of Demeter”. Also we are looking into new layout algorithms. For example, anchored MDS would allow developers to rearrange the map according to their system’s architecture.

NB if you are a professional developer from Switzerland, we encourage you to participate in our user study. Please contact David or me for more information.

towards-an-improved-mental-model-of-software-developers-through-cartographic-visualization-icse-nier-2010

The preview was created with Wordle, an idea that I owe to Tom Zimmermann.

A Software Cartographer’s Vision


Wednesday, November 25th, 2009

It is my vision that developers can speak of code as “up in the north”, “over in the west”, or “down-under in south”. I want to provide developers (and everyone else involved in software development) with a shared & stable & spatial mental model of their project.

A mental model of code that is shared with your team mates and that is stable over time and that is spatial so you can grok it. They way I try to reinforce this, is by providing a map in your IDE. The map is always visible in the bottom-left, just like the navi in your car. Whatever you do in the IDE is reflected on the map.

When you open a source file, its name pops up on the map.

When you browse references or callers, arrows show up on the map, pointing from where you are to references respectively callers.

When you run tests, tubes pop up filled with colored chemicals in green, yellow, or red.

When you are tracking down a bug in the debugger, arrows pop up and show the current stack trace.

Codemap: sneak peek of call graph visualization.

The idea of the map is to provide you with a spatial model of your software. You will quickly learn that, for example, web UI code is in the north, the database layer in the south, unit tests are in the west, and the whole buggy XML mess over there in the east.

On the map, code is grouped by topic and not by structure. That is, even if your architecture sucks, you’ll get a meaningful map that will guide you out of the mess, as you refactor it.

At first, it can be unfamiliar to see code grouped by topic rather than by packages, but you’ll get used to it very quickly. Also, if you come back to the same project a year later, code is supposed to be roughly at the same location (unless of course, in the meantime, domain tectonics put the world of your application upside-down).

Cross-cutting topics are not always a sign of bad packaging. Just think of “session timeout” in both the user interface and the database layer. You won’t probably be able to factor it out into a common class. But still, codemap will put it on a landslide connecting north and south, such that you can find both classes in one place when working on that mysterious bug report related to timeouts. And even better, all your domain classes related to time will be grouped closely in the same neighborhood.

…and you will recall the blue avatar of your team mate that was busy down there yesterday. Dropping him a line of chat might be the quickest way to get that bug killed.

A prototype of Codemap is available for Eclipse.

Not everything is working yet up to our expectations, but you are welcome to give it a try and let us know how you like it. Just follow @codemap on twitter for news, feedback and questions.

— Adrian and David

TermMap of OOPSLA


Tuesday, October 21st, 2008

While browsing the proceedings of this year’s OOPSLA, I thought, hey let’s create a themescape of the proceedings. So I fired up SoftwareCartographer and created a “code map” of all PDFs found in the CD. Normally I use SoftwareCartographer to analyse the vocabulary of software systems, but since it operates on vocabulary only, it can be applied on normal text files aswell.

But before we dive into software cartography, the word cloud of all proceeding documents:

Obviously, there are many Java programmers fighting with their type systems at OOPSLA. In the cloud above, the terms are weighted by number of occurrences in the proceeding documents. I guess on a cloud weighted by fun, Smalltalk Superpowers and Animal Verbing would show up the largest.

On the picture above, we see the “CodeMap” of OOPSLA together with the word cloud of selected papers (click for larger version). CodeMap is a visualization to show source code files (here PDF files) and how similar they are in terms of vocabulary [WCRE 2008]. Each file is rendered as a hill, file size is used as the hill’s height. Location of the files reflects topical similarity. Files that use the same vocabulary are close to each other, files that use different vocabulary are far apart of each other.

SoftwareCartographer is written in Smalltalk, if you have VW installed you can download the WCRE demo and apply it your own software systems or conference papers. Software-Cartographer uses Hapax and Pimon, but not Moose.

For.example is Digg proof thanks to caching by WP Super Cache!