Recovering software architectures via chain-of-thought prompting

Published: 5 Jan 2025
Written by: Chun Fei Lung

Large language models can be used to deduce to what extent your system’s implementation still adheres to a reference architecture.

A system’s architecture is like a Roman temple: if you’ve seen one, you’ve seen them all.

When I apply for jobs, it’s usually for positions in development teams that work on one or several brownfield projects. Every once in a while, these projects will have this thing called “up-to-date software documentation”, but most of the time the documentation is either outdated, incomplete, or completely non-existent.

This is rarely a real problem, because in most cases, team members will have some working knowledge of the system’s architecture that I can use to get up to speed and do things “the right way”. But in some rare situations (side note: This actually happens more often than you think, e.g. when the maintenance of a software system is handed over to a new contractor.), no one in the team really knows how the system works, and the architecture will have to be reverse-engineered, an arduous process that involves a lot of diagramming with boxes and arrows.

This process is called software architecture recovery (SAR). There are several ways to do this, but most are based on inductive recovery, where software artifacts are used to recover the current architecture.

However, software typically evolves in ways that violate the integrity of its architecture. The authors of the paper below therefore argue that you’ll want to follow a deductive recovery approach instead, which the authors call deductive software architecture recovery. This approach allows you to determine how the implementation of the system deviates from its intended architecture.

About the article

Title	Deductive software architecture recovery via chain-of-thought prompting
Year	2024
Author(s)	Satrio Adi Rukmono (Eindhoven University of Technology and Institut Teknologi Bandung) Lina Ochoa (Eindhoven University of Technology) Michel R.V. Chaudron (Eindhoven University of Technology)
Venue	International Conference on Software Engineering

Deductive software architecture recovery is structured in two phases: a so-called reference architecture (RA) definition phase and a code units classification phase.

Reference architecture definition

The first step of the RA definition phase is selecting a reference architecture that captures key aspects of the system’s architectural design. The RA that applies to a system should usually be known by the system’s architects and engineers, but otherwise a common RA like layered architecture, pipes-and-filters, or model-view-controller) may be selected instead.

The second step of the RA definition phase focusses on defining the architectural components that make up the system. Examples of such components include the presentation layer in a layered architecture, or models, views and controllers in MVC architectures.

In the third step, these components and the interactions between them are defined in terms of indicators: source code units (methods, classes, function calls, etc.) that are typically associated with them based on best practices of the RA or the technology stack used to implement the system. The definitions can be specified using plain English:

#	Indicator for presentation layer
PR1	…sets the attributes of UI components, e.g. sets the text of a TextView.
PR2	…notifies listeners about user events, such as button clicks or list item selections.
PR3	…transforms domain objects into visual representations.
PR4	…performs validation on user input.

Code units classification

In the second phase, the syntax and semantics of actual units in the source code are classified using the indicators that were defined previously. This is normally done using static analysis, which is precise but often suffers from poor recall as only things that are specified explicitly will be taken into account.

Specifications in natural language, paired with the use of a large language model capable of deductive reasoning, like OpenAI’s GPT-4, allow for more accurate classification of units into component classes.

The box below shows an example of a prompt that can be used:

In a layered software architecture, one of the layers is the (layer_name)
layer, which (layer_responsibility).

Consider the context of an Android Java project “(project_name)”:
(project_domain_description)

Here are some indicators that a Java method in the project may
belong to a class in the (layer_name) layer:
  (layer_indicators)

The class ‘(class_name)’ contains the method ‘(method_name)’:
  (method_source_code)

Check whether this method satisfies each indicator above. Mention
the specific line of code that supports your reason. At the very last
line, write the boolean verdicts separated by a comma, e.g., ‘true,
true, false, true’. If indeterminate, say ‘false’.

Finally, the classification results are aggregated. This is done simply by counting how many times each indicator appears in a class. The layer that dominates the class is the layer to which the class belongs.

Preliminary results

A proof of concept was created that aims to recover the architecture of K-9 Mail, an open-source email client for Android.

Deductive software architecture recovery was applied to a random selection of 54 out of 779 classes using the process described above. When the results were compared with those from a manual classification, the (semi-)automated process achieved an overall accuracy of 70%, which is already quite impressive for a preliminary result.

The accuracy will likely increase in future iterations, as will the “user-friendliness” of the process. For instance, the authors intend to design a “gold standard” RA that works reasonably well for general-purpose software architecture recovery, thereby removing the need for the first phase.

Ultimately, the authors hope that deductive software architecture recovery can someday be used to provide software architecture explanations in a human-like manner, which can be helpful for onboarding processes and architecture conformance checks.

Summary

Software architecture recovery is the process of recovering a system’s architecture from its implementation artifacts
Deductive software architecture recovery can help you assess how the implementation deviates from its intended architecture

Recovering software architectures via chain-of-thought prompting

Reference architecture definition

Code units classification

Preliminary results

Summary

More about software engineering

Also on Chuniversiteit