How not to structure your database-backed web applications: A study of performance bugs in the wild (2018)

Some Buy ORM employees trying to retrieve something for the customer in an inefficient way

By abstracting away the details of querying databases, Object Relational Mapping (ORM) frameworks allow us to build Web applications efficiently. Sadly, those same abstractions can also prevent them from running efficiently. Yang, Subramaniam, Lu, Yan, and Cheung identified 9 performance anti-patterns and ways to mitigate them.

Why it matters

Performance is important: users expect Web pages to load quickly, and are quick to abandon a Web site if it takes too long to load.

Achieving that performance requires understanding of the application and the DBMS. What makes this hard, is that the latter is typically abstracted away by an ORM framework. Moreover, most performance issues are hard to expose during testing and may only occur with larger datasets (like those on production).

How the study was conducted

The authors installed 12 popular Ruby on Rails applications from GitHub, and populated each of the applications using representative test dataTest data are generated by collecting statistics from the application’s website or websites from similar applications. The test data are then inserted into the applications using a crawler that submits forms. of various sizes (200, 2,000 and 20,000 records in the main database table).

Each application is hosted on an AWS m4.xlarge instance and crawled for two hours, while various metrics are written to a logfile. This process is repeated three times for every test set size.

What discoveries were made

The identified anti-patterns can be grouped into three categories: ORM API misuses, database design, and application design trade-offs.

ORM API misuses

ORM frameworks offer multiple ways to do things, but developers don’t always pick the most efficient option: about half of all performance issues are caused by incorrect usage of ORM APIs.

Anti-pattern 1: Inefficient computation

The computation done by the code is useful, but retrieves more information than needed. Examples include:

Anti-pattern 2: Unnecessary computation

Some performance issues are caused by queries that could have been avoided altogether, for instance when:

Anti-pattern 3: Inefficient data accessing

Data transfer might be slowed down if:

Anti-pattern 4: Unnecessary data retrieval

Data is retrieved that’s never actually used, e.g. retrieving entire records when you are only interested in the ids.

Anti-pattern 5: Inefficient rendering

Each object from a result set is rendered separately using a partial, when it is more efficient to render all objects using string substitution. This solution would have a negative impact on readability howeverIt’s also not clear whether this issue also occurs in similar application frameworks like Django and Laravel.

Database design problems

Suboptimal database design can also cause performance issues.

Anti-pattern 6: Missing fields

Information that can be derived from other fields, should normally not be stored in the database – but if it’s computationally expensive to derive them on the fly, it might be a good idea to store it anyway.

Anti-pattern 7: Missing database indices

It’s generally known that adding indices to tables helps speed up queries, but nonetheless they’re still a common source of performance issues. The authors speculate that developers don’t have enough expertise to pick the right columns to index, and therefore only add them incrementally whenever a performance issue is reported.

Application design trade-offs

Some performance issues are best solved by modifying the application’s functionality.

Anti-pattern 8: Content display trade-offs

Most scalability problems in the tested applications were caused by controller actions that retrieve all database records for a single page. This can easily be solved using pagination.

Anti-pattern 9: Application functionality trade-offs

Sometimes the cheapest solution is to simply remove functionality altogether.

Speedups

The authors applied 64 fixes in total and found that most fixes were very effective, achieving speedups of at least 2⨉. About a quarter achieves a speedup of 5⨉, while the largest speedup (39⨉) is achieved by removing functionality.

Most of the fixes are small and can be applied using less than 10 lines of code.

The important bits

  1. Anti-patterns in ORM API usage, and database and application design can cause performance problems in Web applications
  2. Removing anti-patterns can often result in 2⨉ speedups
  3. Many databased-related performance issues can be resolved using only a few lines of code