Uncovering architectural design decisions (2018)
Systems are easier to maintain if one understands why its code and architecture look the way they do. Unfortunately that “why” often isn’t documented. To address this issue Shahbazian et al. developed RecovAr, a technique that allows partial recovery of design decisions from a project’s issue tracker and version control repository.
Why it matters
An engineer who understands the architectural impact of their changes is less likely to deliver code that introduces regressions or architectural inefficiencies.
That’s only possible of course if they know what that architecture looks like and why it looks like that.
Unfortunately, the decisions made during architectural design are rarely well-documented and mostly reside in architects’ and engineers’ heads – at least, until they leave the project and the knowledge is lost forever.
How the study was conducted
Source code nowadays is usually stored in version control repositories, which contain the complete history of changes to the code in the form of commits. These commits often include references to unique identifiers in issue trackers like JIRA or YouTrack.
The authors propose a technique called RecovAr, which extracts information from these two sources to reconstruct the rationale behind architectural choices. This happens in three stages:
Change analysis: We first need to know how a system’s architecture changes between versions. This can be done by determining the minimal set of changes from one architectureArchitectures can be extracted using several recovery techniques, like the Algorithm for Comprehension-Driven Clustering (ACDC) and Architecture Recovery using Concerns (ARC). You can read all about these techniques in the original article. to another. One may consider these to be architectural consequences of design decisions.
Mapping: The rationale for changes is typically described in a project’s issue tracker. The second stage involves relating issues to code changes by mining the issues’ commit logs and pull requests.
Decision extraction: Finally, architectural changes are related to issues to construct a graph with decisions (rationales) and changes (consequences).
What discoveries were made
RecovAr extracts three types of decisions:
Simple decisions, which consist of a single change triggered by a single issue.
Compound decisions with multiple issues which together trigger a single change.
Cross-cutting decisionse.g. improving system reliability or performance that include multiple changes and one or more issues.
The authors helpfully include some real examples that show what RecovAr is capable of. The table below shows three decisions that can be extracted from Hadoop.
|Simple|| || |
|Compound|| || |
|Cross-cutting|| || |
The authors evaluated RecovAr’s applicability and accuracyWhich, as you may recall, is based on two other concepts: precision and recall by applying the technique on Hadoop and Struts. Both projects are widely-used, open source, and have long and active development history.
On average only 18% of the issues for Hadoop and 6% of the issues for Struts have had architecturally significant effects.
The number of design decisions that can be extracted using RecovAr seems to depend a lot on the technique that’s used to recover the architecture, which makes sense given that RecovAr compares architectures to detect decision consequences.
To determine the precision of RecovAr, two PhD students independently evaluated the identified decisions by manually assigning ratings based on four criteria:
- The rationale in issue summaries must be understandable;
- Multiple issues for a single rationale must clearly be related;
- Consequences (code and architectural changes) must be related to the rationale;
- The size of code and architectural changes must be small enough to be understandable in a short amount of time.
Overall scores are between 0.71 and 0.81, which is pretty good.
Most decisions that were deemed unacceptable had originated in newly released major versions. This is because the number of architectural changes between a minor version and the next major version tends to be large, which understandably leads to hard to understand results.
The authors initially found atrocious recall values around 20%, primarily due to two reasons:
- Architectural changes that are made in off-the-shelf componentse.g. third-party libraries were not marked as part of design decisions;
- “Orphaned” commits that conceptually belong to an issue were not clearly marked as such.
Mitigation of these effects led to a rise in recall to 73% on average, which is also pretty okay.