Are code examples on an online Q&A forum reliable? A study of API misuse on Stack Overflow

Published: 2 Dec 2018
Written by: Chun Fei Lung

I actually don’t know anyone who blindly copies stuff from Stack Overflow, but apparently it’s a thing.

Hot answers aren’t always the best answers

Many posts on Stack Overflow contain code snippets that show how a library can be used to achieve a certain task. Zhang, Upadhyaya, Reinhardt, Rajan, and Kim mined GitHub for API usage “best practices” and conclude that it’s probably not a good idea to reuse online code snippets verbatim.

About the article

Title	Are code examples on an online Q&A forum reliable? A study of API misuse on Stack Overflow
Year	2018
Author(s)	Tianyi Zhang (University of California, Los Angeles) Ganesha Upadhyaya (Iowa State University) Anastasia Reinhardt (George Fox University) Hridesh Rajan (Iowa State University) Miryung Kim (University of California, Los Angeles)
Venue	Proceedings of the 40th International Conference on Software Engineering

Why it matters

If you’re stuck with a programming problem or have just started experimenting with a new framework or library, code examples on Stack Overflow can be tremendously helpful.

Many of them are short and to the point, which makes them easy to understand and reuse.

Unfortunately, herein also lies the rub: the code examples don’t always show all the code that one should use in a production environment.

For example, a code example might show how to open and read from a file, but neglect to point out that you first need to check whether that file actually exists or that the file handle should be closed afterwards.

This may cause all kinds of issues when software is deployed in a production environment, like resource leaks and program crashes.

How the study was conducted

The goal of the study is to determine whether and how code examples on Stack Overflow differ from best practices when it comes to using libraries.

Discovering these best practices is far from trivial: the number of libraries are countless, and each has its own gotchas and best practices.

Mining GitHub

The authors therefore designed a tool called ExampleCheck.

ExampleCheck infers API usage in three steps. More specifically, it:

searches GitHub for snippets in which an API’s method is invoked. It then uses program slicing to filter out statements that are specific to the program. The result is a normalised representation (side note: This means that things that are specific to the analysed project, like code style are converted such that two snippets from two different programs that essentially do the same thing will look identical to each other.) that consists of a sequence of statements that are related to the invoked method.
identifies common patterns in the sequence of statements surrounding calls to the API’s method. Additionally, it filters out calls that are used in only a few outlier examples.
determines which guard conditions should precede API method calls. This is done by first creating canonicalised versions in which project-specific predicates are replaced with true and API-specific variables are given generic names. The conditions are then simplified and merged until only the most frequently appearing patterns remain.

The authors ran ExampleCheck on 380,000 GitHub projects for 100 popular Java API methods from 9 different domains. On average, each method has about 55,000 associated snippets, ranging from 211 to more than 450,000.

ExampleCheck infers 245 API usage patterns. Manual inspection shows that 180 of those patterns are usable for the next part of the study.

Mining Stack Overflow

The authors extract code snippets from all Stack Overflow answer posts that mention one of the 100 Java API methods, and gather some additional information for each post, like the number of votes and whether the post was accepted as a correct answer.

ExampleCheck is used to check whether the sequence of method calls in each snippet is subsumed by one of the identified API usage patterns.

A manual verification of 400 randomly selected posts suggests that about three quarters of the reported posts are true positives.

False positives are generally caused by a lack of deep knowledge about postconditions of methods and usage patterns that are correct, but not used very frequently. Finally, warnings in natural text and examples that are distributed over several <code> blocks also result in false positives.

What discoveries were made

ExampleCheck detects potential API misuse in 31% of Stack Overflow posts that were considered for this study.

If reused without modification, the code in these posts would likely result in crashes (76%), incomplete actions (18%), or resource leaks (2%).

Specifically, APIs for databases, IO, and networking often lack exception handling and proper closing of resources. Examples for cryptography APIs and string manipulation are unreliable for similar reasons: input and output should always be validated, especially if a method might return a null value or throw exceptions.

This wouldn’t really be an issue if it was clear to readers which posts contain API misuse. Unfortunately, that isn’t the case.

For instance, highly voted posts aren’t necessarily more reliable. Moreover, posts with API misuse have more views on average than posts without any misuse. A possible reason for this is that highly voted posts tend to contain concise step-by-step explanations, i.e. they’re written for simplicity and readability rather than real-world circumstances.

It would be helpful if Stack Overflow were to provide some method to show best practices next to code snippets in answer posts. The authors propose a browser extension that adds this functionality. You can find a screenshot and description of the extension in the original article.