The challenges of microservice API evolution

Many modern software systems are split into loosely coupled services to improve maintainability, scalability, and fault tolerance. Service-oriented architecture (SOA) was an early attempt to split large systems into multiple smaller services that communicate with each other over an enterprise service bus (ESB). SOA ultimately didn’t work because the services used a shared domain model, which meant that changes still required integration and coordination throughout the system.
Nowadays most companies have migrated to a microservice architecture (MSA) in which each service has its own domain model that is exposed via an application programming interface (API). In an MSA, microservices exposing functionality via APIs are called providers, while microservices calling those APIs are called consumers. Together, they form a system where each part is able to evolve independently.
Microservice architectures are not a silver bullet that magically solves all problems with integration and coordination. Some amount of synchronisation effort is still necessary, and breaking changes may still occur when new versions of a microservice are deployed.
This study uses semi-structured interviews with engineers, architects, and managers from multiple companies to uncover current microservice API strategies and challenges in practice.
REST APIs and event-driven communication are the most popular ways to exchange messages between services. SOAP APIs are still used in some places, but only for legacy consumers.
Other, more specialised protocols like GraphQL, WebSockets, and protocol buffers were only mentioned by two participants, and were therefore excluded from the study.
REST APIs are used by all participants and considered to be a de facto standard for service communication, because of their ease of use and minimal setup time for consumers. A few participants also provide client SDKs that abstract REST calls. However, this approach increases maintenance overhead with each additional language that is supported.
When a service needs to be accessible for external consumers, many participants implement a dedicated API gateway that handles all incoming requests and abstracts individual services' APIs and versioning. This ensures loose coupling by providing a single access point for external consumers and hiding the internal architecture.
All participants use OpenAPI specifications and the Swagger tool to document their REST APIs. OpenAPI specifications are often shared with external consumers, who are typically familiar with the format or even use it themselves. Additionally, some participants use OpenAPI and Swagger to generate code, contract tests, and client SDKs.
OpenAPI is only used to specify inputs and outputs. Many participants therefore supplement OpenAPI documentation with text or diagrams to describe other important information, like processes, API behaviour, and semantics. This is often done using wiki tools like Confluence.
Some participants do not document internal REST APIs using Swagger, but instead use comments in source code which are seen as faster and more accurate than Swagger-generated documentation.
Event-driven communication patterns such as publish-subscribe and message queues can be used to send asynchronous messages internally between services.
Asynchronous messaging can be used when real-time responses are not required and eventual consistency is acceptable. Participants mainly use RabbitMQ and Apache Kafka as message broker, which help to loosely couple services and allow for easy addition and removal of services without needing to make adjustments elsewhere within the system. This makes asynchronous messaging especially suitable for use with systems that are built around a monolith.
In contrast to REST APIs, very few participants document event-driven communication, and only one uses AsyncAPI, the de facto standard for documenting asynchronous messages. It’s possible that participants don’t write documentation because event-driven communication is only used for system-internal communication. As a result, communication paths, dependencies, message formats, and semantics become implicit (thus hidden and lost over time), which may explain why none of the participants with a team size larger than ten use event-driven communication.
Interview participants use five strategies to evolve provided microservice APIs so that everything continues to work as intended for existing consumers, even when breaking changes are introduced:
-
Breaking changes are unavoidable, so one must deal with them regardless.
Breaking changes are mainly caused by four reasons: introduction of new functionality, changing the underlying technology, improving existing functionality, or improving the API design. Other reasons include bug fixes, security updates, and migrations due to external system changes. Participants try to limit breaking changes to quarterly or half-yearly releases so that affected consumers have enough time to update their services.
Breaking changes can be structural or behavioural. All participants agree that structural changes such as deletions and renamings are breaking. However, only a few identify behavioural changes, where the contents of existing fields are changed, as breaking changes. What’s important here, is that structural changes are generally easier to handle for clients than behavioural changes.
-
It’s even better to avoid breaking changes entirely and staying backwards compatible with consumers.
For smaller APIs, this can be done by adding or duplicating endpoints, messages, and fields instead of changing existing ones. A downside of this strategy is that it fragments APIs and increases the system’s complexity.
Alternatively, providers can design dynamic APIs that are flexible and thus result in fewer breaking changes. as part of the request. A downside of dynamic APIs is that they create more maintenance overhead.
Finally, breaking changes can also be introduced accidentally. To make sure this doesn’t happen, regression tests should be run before services are re-deployed to production.
-
Versioning APIs enables providers to evolve their APIs and consumers to choose which version they want to use.
All participants use some form of versioning, though a few explicitly mention using semantic versioning.
Ideally, a new API version replaces the previous version entirely, and only the latest version needs to be maintained by the provider. In reality, many participants expose multiple versions simultaneously. This is done in one of two ways. Most prefer exposing all API versions in the same service instance, while some choose to deploy each service version separately. Old API versions are generally only removed once all consumers have migrated to a newer version.
-
Most participants collaborate with teams of consumer services during the API evolution process.
Planned changes are generally discussed with consumer teams, to discuss and improve the underlying workflow and API design before release. Interestingly, architects all prefer meetings while senior developers tend to prefer distributing API previews for asynchronous feedback.
To simplify collaboration, many participants discuss and agree on the API definition before starting implementation. Participants believe that such an API-first approach improves the overall design by focussing on readable, self-documenting, and reusable APIs.
-
When all consumers of an API are maintained by other teams within the same company, one might use this simple strategy: “just break (and fix) it”. With this strategy, when developers of a provider introduce breaking changes in their API, they also make corresponding changes in consumers and test suites.
The interviews revealed one strategy to handle the evolution of consumed APIs using an abstraction layer:
- Some participants use dedicated integration services to abstract communication with external services. These services act as proxies that handle authentication and translate field names between external and internal domain models. The main advantage of this approach is that internal services do not need to know anything about the external services they communicate with.
The study identified six challenges in the API evolution process, three of which impact maintainability and usability:
-
Most participants find it hard to understand the impact of API changes.
Developers of providers sometimes accidentally publish breaking changes without versioning or notifying other teams. These breaking changes may be the result of code changes, but can also be inadvertently caused by library or language version updates. These issues may be partially mitigated by collaborating with other teams or by diffing OpenAPI specifications – however, it should be noted that breaking changes are often behavioural in nature and difficult to catch statically.
On the consuming side, participants are notified by external API providers about upcoming changes. There is no generalisable strategy for filtering notifications or assessing the impact of changes. In some cases this is done by providing teams, in others, by consuming teams. A few participants rely on dedicated roles (product owners, architects), but this comes with its own set of challenges.
-
Most participants from larger companies find it challenging to convince consumers to update API calls to a new version after introducing breaking changes. This hinders the clean-up process.
Consumers’ reluctance to upgrade is often caused by a lack of resources or . In response, some teams force consumers to migrate by retiring down old versions with a fixed, non-negotiable deadline – but this is not feasible for business-critical APIs.
-
Many participants encountered problems in communicating changes with other teams.
Some participants don’t know whom to inform, and have to rely on implicit knowledge from the team, an architect or some other higher-up. Others use technical solutions, like server logs, API credentials, or manual documentation to fill in the information gap.
When communicating changes, participants mainly use informal channels such as emails, announcement channels, and instant messages. A few use formal meetings (e.g. sprint reviews) or informal coffee meetings, but this comes with the risk that important information is lost.
The risk is especially high when information passes multiple organisational layers or stakeholders, as information can be altered with every pass down the chain as in a game of telephone.
-
Maintaining outdated versions leads to a degradation in API and source code quality in provided services.
Ensuring backwards compatibility with existing consumers can cause a significant amount of technical debt and maintenance overhead that interferes with the development of new features. This is especially problematic for smaller teams and teams that provide dynamic APIs.
APIs that are very backward compatible may suffer from reduced usability as consumers need to understand the differences between duplicated workflows, requests, and fields.
-
Governmental service providers are seen as uncooperative. It is hard or even impossible to get in touch with them, introduce breaking changes on short notice or none at all, and regularly change agreed-upon API specifications during development.
This is likely because governments provide their services not as a paid product but as a courtesy and therefore prioritise minimising their own costs over those of consumers.
-
Participants did not discuss evolution strategies for event-driven communication. This is because protocols do not natively support versioning, and lightweight messaging frameworks lack built-in versioning. The result is that versioning event-driven communication requires more manual work.
Moreover, the asynchronous nature of event-driven communication requires subscribers to handle old message versions until all such messages have been processed from the queue, resulting in further degradation in the API design. Consequently, participants either migrate publishers and subscribers simultaneously (and accept potential message loss) or .
These factors likely contribute to the greater popularity of REST APIs relative to event-driven communication.
-
REST APIs are more popular than event-driven communication, are easier to use, and have more mature tooling
-
Many API evolution strategies and challenges deal with communication and organisational collaboration
-
API evolution strategies are a constant balancing act between the needs of providers and consumers