Understanding large-scale software – A hierarchical view (2019)

A soldier stands guard in front of portraits of North Korea’s revered leaders
Use top-down approaches if you need to maintain systems

Large software systems are more expensive to maintain – not because changes require more code, but because it’s harder to understand such systems. Many developers and studies focus on things like cyclomatic complexity and API documentation, but they aren’t exactly helpful if you need to understand entire systems.

Why it matters

Most research in program comprehension focusses on understanding of code. That certainly offers valuable insights, but it’s not necessarily representative of how developers actually work with software.

In practice, developers who maintain software systems need to understand more than just the code: they have to know what the rest of the system looks like, and how their changes affect the rest of the system.

How do developers manage to understand entire systems? What methods do developers use to gain a better understanding of the systems they’re working on?

How the study was conducted

The authors conducted semi-structured interviews with 11 experienced developers, managers, architects, and entrepreneurs from different organisations.

What discoveries were made

The study tells us something about how developers learn about the inner workings of larger software systems.

Depth of comprehension

There are two major levels of comprehension:

Some interviewees point out that there are other levels as well:

Avoiding comprehension

Full, white-box comprehension of a system often isn’t just unattainable – it’s also undesirable. Developers prefer to avoid actual comprehension whenever possible, by:

Comprehension strategies

There are two approaches that one can take when attempting to understand a system: top-down or bottom-up. Top-down approaches typically involve design documents and API documentation, while bottom-up approaches are more likely to involve the actual source code and possibly some inline documentation.

Both approaches have their up- and downsides: for example, a top-down approach makes it easy to understand which and how components are used, but might also quickly overwhelm newcomers. Some therefore suggest using a combination of the two approaches.

However, it seems that developers who gravitate towards top-down approaches are better able to comprehend large volumes of code, as it allows them to defer understanding of details that don’t really matter that much yet.

Aside from top-down and bottom-up approaches, the interviewees also mention other methods to become acquainted with unfamiliar systems:

Hierarchical comprehension

Comprehension means different things at different levels.

Functions

Interviewees define understanding of a function as understanding of its contract: what parameters does it accept, what does it do, what does it return? White-box comprehension generally isn’t necessary, unless:

Classes

At the class level designer intent starts to become more important than interfaces. In order to really understand a class, one needs to know why it exists and when it should (not) be used.

This information cannot be found in the code itself, but can sometimes be found in design documentation or by learning more about the parts of the system where the class is used or located.

Packages

Packages (e.g. third-party libraries) are a collection of classes that can easily be reused “as is”.

One can understand how a package works without the need to consider the rest of the system. This makes packages the only level at which black-box comprehension is often sufficient.

Systems

It’s a bit harder to define what it means to understand an entire system. Many interviewees talk about:

The actual code is completely irrelevant at this level. But it’s not enough to understand the system in its current state: its history and evolution also need to be taken into account. The structure – or even the entire project – may reflect intentions, constraints, and choices that are no longer relevant. This makes it challenging to understand systems.

The important bits

  1. Large volumes of code may be easier to comprehend using a top-down approach as it allows you to skip unnecessary details
  2. System comprehension is largely based on understanding of structure and intent rather than code