Chuniversiteit logomarkChuniversiteit.nl
The Toilet Paper

A comparison of open tracing tools

I’m in the market for a tool that can help me analyse logs, traces, and metrics, and I was hoping that this paper could help me pick one.

Scott Tracy (Thunderbirds) gives you a thumbs up
Scott Tracy approves of this message

The outcome of the software engineering process is invisible, which makes it hard to understand progress and reason about the produced output. It becomes especially hard when developing systems with many distributed components.

Measurements of the state and actions of a system can help make the invisible visible. This is what the term “observability” is about.

Two important terms in observability are “tracing” and “telemetry”. Tracing allows engineers to follow individual execution paths throughout a system. Telemetry on the other hand is about collecting a large amount of data that only provides insights when combined.

Tracing is used in various different ways, e.g. to understand why a system does not meet performance requirements or where failures occur. It is an important part of the toolkit used by software engineers to monitor, debug, and optimise distributed systems.

This paper describes the results of a so-called systematic multivocal literature review (MLR), which includes both peer-reviewed and grey literature. The review considers the distinctive features, popularity, advantages, and issues of a large number of tracing tools that implement the OpenTracing API (which has since been replaced by OpenTelemetry).

Distinctive features

Link

The review covers a total of 30 tracing tools. Only 12 of these tools are fully open source. A few tools offer a free tier with limited features.

The table below shows the software license, supported programming language, pricing models, and year of first release of each tracing tool. Programming languages marked with an asterisk support so-called non-invasive instrumentation. This means that code can be automatically modified so that tracing information is sent to the tool.

Tool License Programming languages Pricing Created
Appdash MIT Go*, Python*, Ruby* Free 2014
AppDynamics Proprietary, Apache-2.0, GPL-3.0, MIT Java*, Shell, .NET*, Python*, JavaScript, Go*, C/C++*, PHP*, Node.js* Paid; Quote 2008
Containiq Proprietary C/C++*, Go*, Rust*, Python*, Ruby*, Node.js* Paid; Quote 2021
Datadog Proprietary, Apache-2.0, BSD-3-Clause, GPL-2.0, MIT, MPL-2.0 Go*, Python*, Ruby*, JavaScript, Node.js*, Java*, .NET* Free; Paid 2010
Dynatrace Apache-2.0 C++*, .NET*, Erlang*, Go*, Java*, Node.js*, Python*, Ruby*, Rust* Paid 2005
ElasticAPM Apache-2.0, BSD-2-Clause, BSD-3-Clause, Elastic-2.0, MIT Go*, Python*, iOS*, Java*, Node.js*, PHP*, Ruby*, Gherkin Paid 2012
Grafana tempo AGPL-3.0-only Java*, Go*, .NET*, Python*, Node.js* Free 2020
Haystack Apache-2.0 Java*, Node.js*, Python*, Go*, HCL, Shell, Smarty Free 2017
Hypertrace Traceable Community License Agreement (1.0) Java*, Go*, Python*, Node.js*, C++*, .NET* Free 2020
Honeycomb.io Apache-2.0, MIT Go*, Java*, .NET*, Node.js*, Python*, Ruby*, JavaScript, Python Free; Quote 2016
Instana Proprietary, Apache-2.0, GPL-2.0, MIT Shell, JavaScript, Go, Java*, Python*, .NET*, Clojure*, Kotlin*, Python*, PHP*, Scala*, Node.js*, Ruby* Paid 2015
Jaeger Apache-2.0 Go*, Java*, Node.js*, Python*, C++*, C#* Free 2016
Kamon Apache-2.0 Java*, Scala* Free; Paid; Quote 2017
LightStep Proprietary, Apache-2.0, BSD-2-Clause, BSD-3-Clause, CC-BY-SA-4.0, MIT Go*, JavaScript, Python*, Java*, HCL, .NET*, Node.js* Free; Paid; Quote 2015
Logit.io MIT .NET*, Go*, Node.js*, Python*, Ruby*, JavaScript, Shell Paid 2013
Lumigo Apache-2.0 Python*, Node.js*, Java, Go Paid; Quote 2018
New Relic Apache-2.0 C*, Go*, Java*, .NET*, Node.js*, PHP*, Python*, Ruby*, JavaScript, Shell Free; Paid; Quote 2008
Ocelot Apache-2.0 Java*, JavaScript Free 2018
OpenCensus Apache-2.0 Python*, Node.js*, Go*, C#*, C++*, Erlang*, Java* Free 2017
OpenTelemetry Apache-2.0 C++*, .NET*, Erlang*, Go*, Java*, JavaScript*, PHP*, Python*, Ruby*, Rust*, Swift* Free 2019
Sentry BSL-1.1 .NET*, JavaScript*, Node.js*, Python*, PHP*, Rust*, Java*, Go* Free; Paid; Quote 2012
Splunk Apache-2.0 Python*, Java*, Node.js*, .NET*, Go*, Ruby*, PHP* Paid 2003
Signoz MIT Java*, Python*, JavaScript*, Go*, PHP*, .NET*, Ruby*, Elixir*, Rust* Free; Paid; Quote 2020
Site24x7 BSD-2-Clause, MIT Java*, .NET*, Ruby*, PHP*, Node.js*, Python* Paid 2006
SkyWalking Apache-2.0 Java*, Python*, Node.js*, Lua*, JavaScript*, Rust*, PHP* Free 2015
StageMonitor Apache-2.0 Java*, HTML, JavaScript Free 2013
Tanzu Apache-2.0 Java*, C++*, Go*, .NET*, Python*, Ruby Free 2019
Uptrace BSD-2-Clause, Apache-2.0 Go*, Node.js*, .NET*, Ruby*, Python* Paid 2021
Victoriametrics Apache-2.0 Go*, JavaScript* Free; Quote 2018
Zipkin Apache-2.0 C#*, Go*, Java*, JavaScript*, Ruby*, Scala*, PHP* Free 2012

Tracing tools can consist of several components:

  • Libraries are used in source code to send data to an agent or directly to a collection component.

  • Agents are responsible for collecting data for a particular context, e.g. an application, the operating system, or a database. They run as part of applications or as a separate component and forward data to collection components.

  • Collectors persist data to a long-term storage component, like a time-series database. To improve performance, this can be done through a transport component that fulfils routing or caching tasks.

  • Data processing components analyse incoming data and prepare it for usage in visualisations, dashboarding, and alerting.

In practice, most tracing tools only include some of these components. The table below shows which components are included with each tool:

Tool Libraries Agent Transport Collector Storage Data processing UI
Appdash
AppDynamics
Containiq
Datadog
Dynatrace
ElasticAPM
Grafana tempo
Haystack
Hypertrace
Honeycomb.io
Instana
Jaeger
Kamon
LightStep
Logit.io
Lumigo
New Relic
Ocelot
OpenTelemetry
Sentry
Splunk
Signoz
Site24x7
SkyWalking
StageMonitor
Tanzu
Uptrace
Victoriametrics
Zipkin

The primary purpose of tracing tools is to collect data that allows users to see how a request traverses different services. However, many tools also collect metrics and logs, which can be incredibly helpful when observing traces:

Tool Traces Metrics Logs
Appdash
AppDynamics
Containiq
Datadog
Dynatrace
ElasticAPM
Grafana tempo
Honeycomb.io
Hypertrace
Haystack
Instana
Jaeger
Kamon
LightStep
Logit.io
Lumigo
New Relic
Ocelot
OpenCensus
OpenTelemetry
Sentry
Splunk
SkyWalking
Site24x7
Signoz
StageMonitor
Tanzu
Uptrace
Victoriametrics
Zipkin

For interoperability, it’s not only important that a tool supports as many programming languages as possible, but also has a documented API, provides support for OpenTelemetry, and can be self-hosted:

Tool API OpenTelemetry Self-hosting
Appdash
AppDynamics
Containiq
Datadog
Dynatrace
ElasticAPM
Grafana tempo
Honeycomb.io
Hypertrace
Haystack
Instana
Jaeger
Kamon
LightStep
Logit.io
Lumigo
New Relic
Ocelot
Opencensus
OpenTelemetry
Sentry
Splunk
SkyWalking
Site24x7
Signoz
StageMonitor
Tanzu
Uptrace
Victoriametrics
Zipkin

Tool popularity

Link

Only three tools have been cited by more than 10 papers: Zipkin (29), Jaeger (18), and LightStep (10). That doesn’t mean these are the most popular tools, however.

A search on technology-based social media platforms reveals that it’s actually Splunk, Haystack, and Sentry that are the most popular, followed by New Relic and Datadog. The top 10 tools (which also include Zipkin, Jaeger, OpenTelemetry, Dynatrace, and AppDynamics) together take up over 90% of social media coverage of tracing tools.

Perceived benefits and issues

Link

Not all social media coverage is positive. A sentiment analysis on online texts about tracing tools provides some insight into how much “appreciation” the community has for each of the 10 most popular tools:

Tool Positive (%) Neutral (%) Negative (%)
AppDynamics 47.4 36.6 16.0
Datadog 43.3 42.2 14.6
Dynatrace 45.7 39.7 14.6
Haystack 38.0 40.3 21.7
Jaeger 40.8 46.6 12.6
New Relic 32.5 47.4 20.1
OpenTelemetry 41.9 46.2 11.9
Sentry 30.8 36.6 32.6
Splunk 44.7 40.1 15.2
Zipkin 41.1 45.8 13.1

Online discussion about tracing tools is often related to several topics:

  • Architecture, e.g. ability to scale well in a microservice architecture

  • Deployment & Integration, e.g. the ability to deploy tools in cloud-based and containerised infrastructures without downtime

  • Development, e.g. the effect of the tool on DevOps and collaboration, resource usage, and troubleshooting

  • Measurement, e.g. measuring the performance of microservice architectures via application metrics, and aggregation

  • Tracing, e.g. real-time data, distributed tracing, error monitoring, incident notifications, and ability to identify performance bottlenecks

  • Usability, e.g. enhancing developer productivity, downtime reduction, flexibility, security, reliability, and user experience in general

By assessing the sentiment for texts about each topic, we get an idea of the strengths and weaknesses of a tool. The table below summarises the topic sentiment for each tool. A complete overview can be found in the original article, in Tables 11 and 12.

Criteria AppDynamics Datadog Dynatrace Haystack Jaeger New Relic OpenTelemetry Sentry Splunk Zipkin
Architecture
Deployment & Integration
Development
Measurement
Tracing
Usability

Note that based on these results one cannot conclude that any of these tools is clearly better than the rest. Different tools provide different features that suit teams with different preferences. Some might only consider self-hosted solutions, while others may prefer hosted solutions with commercial support. Moreover, the tool ideally needs to support the programming languages that are used by the team.

At the same time, there are still many aspects of tracing tools that have not been covered by this review, so make sure you always do your own research!

Summary

Link
  1. There are 10 popular tracing tools that all have their own strengths and weaknesses, but there is no clear winner