Chuniversiteit.nl
The Toilet Paper

How not to structure your database-backed web applications

ORM frameworks make your life a lot easier, but can just as easily lead to performance problems.

Some Buy ORM employees trying to retrieve something for the customer in an inefficient way

By abstracting away the details of querying databases, Object Relational Mapping (ORM) frameworks allow us to build Web applications efficiently. Sadly, those same abstractions can also prevent them from running efficiently. Yang, Subramaniam, Lu, Yan, and Cheung identified 9 performance anti-patterns and ways to mitigate them.

Why it matters

Performance is important: users expect Web pages to load quickly, and are quick to abandon a Web site if it takes too long to load.

Achieving that performance requires understanding of the application and the database management system (DBMS). What makes this hard, is that the latter is typically abstracted away by an ORM framework. Moreover, most performance issues are hard to expose during testing and may only occur with larger datasets (like those on production).

How the study was conducted

The authors installed 12 popular Ruby on Rails applications from GitHub, and populated each of the applications using of various sizes (200, 2,000 and 20,000 records in the main database table).

Each application is hosted on an AWS m4.xlarge instance and crawled for two hours, while various metrics are written to a logfile. This process is repeated three times for every test set size.

What discoveries were made

The identified anti-patterns can be grouped into three categories: ORM API misuses, database design, and application design trade-offs.

ORM API misuses

ORM frameworks offer multiple ways to do things, but developers don’t always pick the most efficient option: about half of all performance issues are caused by incorrect usage of ORM APIs.

Anti-pattern 1: Inefficient computation

The computation done by the code is useful, but retrieves more information than needed. Examples include:

  • Using a COUNT(*) to check whether there’s any record that satisfies a condition, when a SELECT 1 FROM ... LIMIT 1 would have sufficed;

  • Retrieving records from the database only to count them using application code;

  • Retrieving records from the database when the same information is already present in objects that are already in memory.

Anti-pattern 2: Unnecessary computation

Some performance issues are caused by queries that could have been avoided altogether, for instance when:

  • The same query is executed within each iteration of a for-loop, even though the result stays the same;

  • Queries are repeatedly reissued to update in-memory copies of objects, regardless of whether the objects are (immediately) used;

  • Queries are executed even though the results are already known.

Anti-pattern 3: Inefficient data accessing

Data transfer might be slowed down if:

  • records from referenced (or referencing) tables are rather than loaded eagerly;

  • so much data is eagerly loaded that it creates huge memory pressure;

  • N queries are issued to update N records, when they can also be batched together into a single query.

Anti-pattern 4: Unnecessary data retrieval

Data is retrieved that’s never actually used, e.g. retrieving entire records when you are only interested in the ids.

Anti-pattern 5: Inefficient rendering

Each object from a result set is rendered separately using a partial, when it is more efficient to render all objects using string substitution. .

Database design problems

Suboptimal database design can also cause performance issues.

Anti-pattern 6: Missing fields

Information that can be derived from other fields, should normally not be stored in the database – but if it’s computationally expensive to derive them on the fly, it might be a good idea to store it anyway.

Anti-pattern 7: Missing database indices

It’s generally known that adding indices to tables helps speed up queries, but nonetheless they’re still a common source of performance issues. The authors speculate that developers don’t have enough expertise to pick the right columns to index, and therefore only add them incrementally whenever a performance issue is reported.

Application design trade-offs

Some performance issues are best solved by modifying the application’s functionality.

Anti-pattern 8: Content display trade-offs

Most scalability problems in the tested applications were caused by controller actions that retrieve all database records for a single page. This can easily be solved using pagination.

Anti-pattern 9: Application functionality trade-offs

Sometimes the cheapest solution is to simply remove functionality altogether.

Speedups

The authors applied 64 fixes in total and found that most fixes were very effective, achieving speedups of at least 2⨉. About a quarter achieves a speedup of 5⨉, while the largest speedup (39⨉) is achieved by removing functionality.

Most of the fixes are small and can be applied using less than 10 lines of code.

Summary

  1. Anti-patterns in ORM API usage, and database and application design can cause performance problems in Web applications

  2. Removing anti-patterns can often result in 2⨉ speedups

  3. Many databased-related performance issues can be resolved using only a few lines of code