How to write good commit messages

Published: 9 Apr 2023
Written by: Chun Fei Lung

In today’s “Things That People Don’t Want To Bother With” blog we’re having a look at commit messages and how to write them.

“I do” is commonly used as a commitment message

A version control system like Git maintains a record of code changes in the form of commits. Each commit contains changes to source code (and possibly other artefacts) and a message that describes the changes. This allows collaborators to understand the context of the change and its impact on the project. For long-lived projects, commit messages might be the only source of information left for future developers who wish to understand what changes were made and why.

This is also basically the tl;dr: your commit messages should communicate what changes are made and why. Of course it’s a bit more nuanced than that, so keep on reading if you want to learn more.

About the article

Title	What makes a good commit message?
Year	2022
Author(s)	Yingchen Tian (Beijing Institute of Technology) Yuxia Zhang (Beijing Institute of Technology) Klaas-Jan Stol (University College Cork) Lin Jiang (Beijing Institute of Technology) Hui Liu (Beijing Institute of Technology)
Venue	International Conference on Software Engineering

Bad messages

In practice the quality of commit messages varies wildly, often due to a lack of time or motivation (side note: Or awareness?). A previous study found that about 14% of commit messages in 23,000 open-source projects were completely empty and as many as 75% only contained a few words. A mere 10% of commits had messages with “normal” English sentences!

The authors manually classified 1,597 commits from five major open source Java projects into four types: 1) why and what; 2) what, but no why; 3) why, but no what; and 4) neither why nor what. Although the first type is the most common, the latter three types still make up about 44% of all commit messages.

The why is most often left out of messages, presumably because developers find it more challenging to describe the rationale behind their changes.

A very small portion of commit messages does not contain any useful information. These can be grouped into five categories:

Single-word messages, like “merge”, “polish” or a file name;
Submit-centred messages that simply express the fact that the commit “changes” something;
Scope-centred messages which primarily convey the size of the change, e.g. “minor change”;
Redundant messages that repeat information that’s already in the diff;
Irrelevant messages that convey no information at all about the change (side note: My favourite example is “derp”.).

Good messages

The why and what should be clear for each commit, but that doesn’t mean that they need to be expressed explicitly. Both the why and what can be omitted when the reason for a change is common sense or can be explained by the change itself.

The authors identified five types of “why” expression categories (side note: Links to issue reports and pull requests are treated as a way to provide “why” information.):

Describe issue: Commits in this category directly describe the motivation of a code change. This can be done by describing an error scenario, citing errors or warnings from quality assurance tools, or describing shortcomings or weaknesses in the current implementation.
Illustrate requirement: A message can also describe the requirements that led to the commit, e.g. user needs, obsolescence of features, or a change in the runtime or the environment.
Describe objective: Some commit messages are more forward-looking and describe the purpose of the change, e.g. to fix a defect or improve the code in some way.
Imply necessity: Commit messages can describe the need for changes in an indirect way, for instance by mentioning conventions or standards, how it relates to a previous commit or a bigger change, or the benefits that a change might bring.
Missing why: In some cases the rationale is common sense or can be easily inferred, e.g. when adding test cases, fixing typos, updating text, annotations or version numbers, or refactoring code.

They also found four types of “what” expression categories:

Summarise code object change: Commit messages can summarise the changes in a commit. This can be done by highlighting characteristics of the change, summarising the change, describing the “before” and “after” states of the code, or simply by listing the changes.
Describe implementation principle: A commit message can describe the technical principles that underpin the changes. This type of description isn’t seen very often.
Illustrate function: Messages in this category explain code changes from a functional or behavioural perspective. This is one of the more common categories.
Missing what: Changes that are small and simple, like the correction of typographic errors, do not require a specification of what has changed.

These nine expression categories are not evenly distributed over the three types of maintenance activities: corrective, adaptive, and perfective (side note: Corrective changes are made to fix issues, adaptive changes are done to implement features, while perfective changes improve the software in different ways, e.g. performance.). The table below shows how often each expression category type occurs with each major type of maintenance activity. This can be useful for those who are not sure what to write in their commit message. First determine the type of change you’re making, then make sure that your message contains at least the two most common “why” and “what” categories for that type!

	Category	Corrective, N=116 (%)	Adaptive, N=63 (%)	Perfective, N=73 (%)
How to express “Why”	Describe issue	45.7	12.7	6.9
	Illustrate requirement	12.1	22.2	21.9
	Describe objective	6.9	7.9	11.0
	Imply necessity	19.0	39.7	26.0
	Missing why	12.1	15.9	34.2
	Describe issue & Describe objective	0.8	0.0	0.0
	Describe issue & Imply necessity	2.6	0.0	0.0
	Illustrate requirement & Imply necessity	0.8	1.6	0.0
	Total	100.0	100.0	100.0
How to express What	Summarise code object change	58.6	60.3	76.7
	Illustrate function	22.4	27.0	8.2
	Describe implementation principle	4.3	1.6	0.0
	Missing what	6.1	3.2	13.7
	Summarise code object change & Illustrate function	8.6	7.9	1.4
	Total	100.0	100.0	100.0

Once you know all this, it’s very tempting to build a classification model that can automatically determine the quality of commit messages. This just so happens to be the final contribution of this study. The authors used several techniques and found that models based on Bi-LSTM have the best performance on classifying whether a commit message describe the why and what of a change. The accuracy is reportedly somewhere between 75.9 and 91.0 percent, but sadly there doesn’t appear to be a way to use these models yourself.