Twitter icon.
Random notes on software, programming and languages.
By Adrian Kuhn

Archive for April, 2010

Software by Night


Wednesday, April 21st, 2010

In this post I’ll present an idea from unpublished work that uses light effects to visualize dynamic information in Software Cartography. (If you are unfamiliar with spatial-representation of software please refer to “A Software cartographer’s Vision” previously featured on this blog.)

“Software is not just structural, but behavioral; so the next thing I would like to see is traffic and people walking.” — Rob Deline, in the question session of Frank Steinbrückner’s presentation at MSA 2010

Software visualizations typically convey a static view of software systems. Software maps are no different. On a software map all code of a system is shown—no matter whether it is used at runtime or dead code. This is just the same as with Earth when seen from space: at day you see all land masses but only at night it is revealed where human activity takes place and where not. So my idea is to realize the same for software maps.

  • use static information to group the software artifacts into structural clusters, visualized as land masses.
  • use dynamic information to embellish the visualization with behavior activity, visualized as sources of lights.

Light effects, such as flares and glow, make the execution of software visible. What we actually do is to add more dimensions to a two-dimensional visualization, just as Michele Lanza did with his seminal polymetric views. There seem to be three promising dimensions for light effects

  • brightness of light source,
  • blurriness of surrounding glow,
  • intensity of flares that flash up,

that are possibly best used in combination to visualize one or two behavioral properties.

When modelling animated visualization the notion of change is essential. Ben Fry’s work on organic visualization is a good source of inspiration to model time-dependent values, In his work he proposed intergrators as a new numerical data-type with first-class support for continuously changing values (as eg ease-in and ease-out).

Other than static numericals, integrators continuously change their value over time. Integrator values are in the continuous rather than discrete domain, they grow and decay rather than increment and decrement.

Fry associates each visual component with a set of set of static and changing values, using the data-types Numerical and Integrator. He describes an integrator as “a continuously changing value” and defines the following operations on integrators:

  • set() explicitly set the current value of the Integrator. Normally, this is only used to set an initial value for the Integrator.
  • impulse() adds a specified amount of force to the Integrator. Equivalent to
    incrementing a Numeric value, but executed in the continuous domain,
    i.e. the amount added attenuates over time.
  • decay() the opposite of an impulse. This is a decrease in the continuous domain. Often used to atrophy values over time.
  • attract() apply a force to move the Integrator towards a particular value. Instead of setting the Integrator to a particular value, a target value is set for the Integrator that it will reach over time. Enables smooth
    transitions if the target is changing.
  • repel() the opposite of attract, moves the value of the Integrator away from a particular numeric value. If the value being avoided is greater, the
    Integrator is decreased further. If less, the Integrator is increased.
  • update() this is used internally to update the Integrator’s current value on each time step, after calculating a new velocity based on the forces
    that have been applied to the Integrator by the rules that affect it.
  • reset() after each the current value is updated, the forces are cleared. New forces are added on each time step by re-applying the rules.

To implement software by night integrators are best used in order to add light effects to the still view of a static software map. Currently we formalized two light dimensions, flare and glow. So when a software artifact is used then it flashes up with a sudden impuls to attract the users attention and then decays in a slowly disappearing glow. Calls are visualized by sparks that jump from call-site to call-site. So users can se where current action is taking place, but also where past action has been taking place and how often it has been doing so.

Please note that “Software at Night” is not yet available in the downloadable Codemap plug-in. I am working on an external prototype for an upcoming publication, thought. The missing integration in Eclipse is mainly due to the lack of an easy accessible source of dynamic information within Eclipse.

To learn about new Codemap releases, follow http://twitter.com/codemap

Happy Hacking!

Exploring the Layout of Software Maps


Thursday, April 1st, 2010

In this post I will cover my current work on Software Cartography. If you are unfamiliar with spatial-representation of software please refer to “A Software cartographer’s Vision” previously featured on this blog.

Codemap is available for download as Eclipse plug-in.

A major issue of software cartography is the base layout of software maps. A good base layout is both stable over time and conveys a meaningful grouping of software artifacts into islands (ie clusters). My initial attempt was to based the layout on vocabulary found in the source code, that is identifier names and comments. Vocabulary has the nice property that it is more stable over time than structure, and that it naturally conveys a meaningful clustering of latent topics.

However a first user study has shown that a vocabulary-based layout does not meet the developers intuition. Even though developers had been well aware that the layout was based on vocabulary, they took conclusions that assumed a structure-based layout. We learnt this from the think-aloud protocol used in the user study.

The user study included six developers that each had 1.5 hours time to explore an unknown software system. They were given six tasks of increasing complexity—cumulating in fixing an actual bug.

I will cover the user study in more details in a future blog post. In this post I shall discuss some proposals for alternative layouts…

* * *

Package-based layout — is typically the first layout that comes to people’s mind when they hear about software cartography. In fact there is a rich set of previous work that uses packag-based layout. For example, Codecity uses a treemap layout to visualize software systems based in their package structure.

A treemap layout makes good use of screen space, and using packages is likely to convey a meaningful clustering (ignoring for the moment that Java’s package nesting is a mere naming convention and bears no language semantics what so ever.)

However, packaging layouts are not stable in the face of change. They are gap-less and thus major parts of the map have to be moved aside in order to make space for new elements (and vice-versa for disappearing elements). Codecity works around this limitation by offering an after-the-fact analysis mode where the layout of past snapshots anticipates the latest state of the system. However, this is only applicable to post-mortem analysis but not to tools that are embedded in a development environment with live changes.

Callgraph-based layout — is the other common layout of software visualization. Static analysis is used to find the call-relations between software artifacts, and then standard force-based graph drawing is applied visualize to the system. Force-based graph drawing adapt well to change, however this is canceled by call-relations having a very high change frequency.

Call-relations are well understood by developers and possibly quite close to developers’s intuitive understanding of distance in software systems. Personally however, I would prefer if the map were based on a more abstract distance measurement.

For example, It would be desirable if call-relations that are displayed on the map had a meaningful interpretation. A long-distance-call should have a diagnostic interpretation. Given a layout based on call-graphs however, a long-distance-call would just indicate failure of the force-based layout to find a solution where all calls have the same length.

Law-of-demeter layout — the “Law of Demeter” is a guideline in software design. It states that each method should only talk to its friends, which are defined as its class’s fields, its local variables and its method arguments. Based on this we can defined an idealized call-based distance between software artifacts.

Given a LOD-based layout, software artifacts are close to one another if they are supposed to call one another and far apart if they better oughta not call one another. Thus we get the desired property that visualizing call-graphs conveys meaningful arrow distances. And also, compared to a raw call-graph, a LOD-based graph is less connected and thus better suited for graph drawing.

Best fo all: on a LOD-based map, any long-distance-call has a diagnostic interpretation that helps developers to take actions: long calls possibly violates the “Law of Demeter”!

History-driven layout — Claus Lewerentz and Frank Steinbrückner developed a layout for software cities that is based on historical data. They start out with a central plaza and then each new packages branches off as a new street, and each new class is a building along these streets. This generates visually awesome and intuitively stable software maps.

Their work has been (first?) presented at the recent MSA 2010 workshop and will be published soon. I will cover their work in more detail when their publication appears. For the moment please refer to this slide deck of Frank Steinbrückner’s MSA presentation

Test-dependency-based layout — despite Agile claims, unit tests do depend on one another. We can record this dependencies using dynamic instrumentation and profiling, and establish a partial-order of unit tests. Based on that partial-order we can do a radial tree layout of tests and place each software artifact closest to their corresponding tests (kernel method).

Test-dependency-based layout is stable, since changes to test code are typically less frequent than changes to the code under tests. In particular, changes to the dependencies between tests.

Also, a test-dependency-based layout has a clear diagnostic interpretation. Along any radial axis, software artifact on the inside provide services to their clients on the outside. Thus the map’s layout is like a cell, with API interfaces on the outside and a basic kernel at the core.

* * *

An important criterion, that has not yet been discussed, is “ease of retrieval” (there might be proper term in data mining for that). What I mean is availability and accessibility of required data. For example, vocabulary data is always available and easy to access (you don’t even need to parse the source code) while historical data and dynamic instrumentation are often either not available or not easy accessible. So far, I’ve thus only implement vocabulary-based layout (featured in the current download of Codemap) and a prototype of law-of-demeter-based layout.

In this post I’ve only scratched the surface of possible map layouts and their design space. For example, Niko Schwarz took inspiration from Richard Dawkins and proposed code hospitality as a foundation for software maps. Code hospitality is a measure for how likely snippets of one class are to run when copied into another class.

If you’ve got your own crazy ideas, tweet or blog about it!

I’ll collect all pingbacks in a future blog post.

For.example is Digg proof thanks to caching by WP Super Cache!