How not to structure your database-backed web applications: A study of performance bugs in the wild (2018)
By abstracting away the details of querying databases, Object Relational Mapping (ORM) frameworks allow us to build Web applications efficiently. Sadly, those same abstractions can also prevent them from running efficiently. Yang, Subramaniam, Lu, Yan, and Cheung identified 9 performance anti-patterns and ways to mitigate them.
Why it matters
Performance is important: users expect Web pages to load quickly, and are quick to abandon a Web site if it takes too long to load.
Achieving that performance requires understanding of the application and the DBMS. What makes this hard, is that the latter is typically abstracted away by an ORM framework. Moreover, most performance issues are hard to expose during testing and may only occur with larger datasets (like those on production).
How the study was conducted
The authors installed 12 popular Ruby on Rails applications from GitHub, and populated each of the applications using representative test dataTest data are generated by collecting statistics from the application’s website or websites from similar applications. The test data are then inserted into the applications using a crawler that submits forms. of various sizes (200, 2,000 and 20,000 records in the main database table).
Each application is hosted on an AWS m4.xlarge instance and crawled for two hours, while various metrics are written to a logfile. This process is repeated three times for every test set size.
What discoveries were made
The identified anti-patterns can be grouped into three categories: ORM API misuses, database design, and application design trade-offs.
ORM API misuses
ORM frameworks offer multiple ways to do things, but developers don’t always pick the most efficient option: about half of all performance issues are caused by incorrect usage of ORM APIs.
Anti-pattern 1: Inefficient computation
The computation done by the code is useful, but retrieves more information than needed. Examples include:
- Using a
COUNT(*)to check whether there’s any record that satisfies a condition, when a
SELECT 1 FROM ... LIMIT 1would have sufficed;
- Retrieving records from the database only to count them using application code;
- Retrieving records from the database when the same information is already present in objects that are already in memory.
Anti-pattern 2: Unnecessary computation
Some performance issues are caused by queries that could have been avoided altogether, for instance when:
- The same query is executed within each iteration of a for-loop, even though the result stays the same;
- Queries are repeatedly reissued to update in-memory copies of objects, regardless of whether the objects are (immediately) used;
- Queries are executed even though the results are already known.
Anti-pattern 3: Inefficient data accessing
Data transfer might be slowed down if:
- records from referenced (or referencing) tables are retrieved using separate queriesThis is also known as the “N+1” problem rather than loaded eagerly;
- so much data is eagerly loaded that it creates huge memory pressure;
- N queries are issued to update N records, when they can also be batched together into a single query.
Anti-pattern 4: Unnecessary data retrieval
Data is retrieved that’s never actually used, e.g. retrieving entire records when you are only interested in the
Anti-pattern 5: Inefficient rendering
Each object from a result set is rendered separately using a partial, when it is more efficient to render all objects using string substitution. This solution would have a negative impact on readability howeverIt’s also not clear whether this issue also occurs in similar application frameworks like Django and Laravel.
Database design problems
Suboptimal database design can also cause performance issues.
Anti-pattern 6: Missing fields
Information that can be derived from other fields, should normally not be stored in the database – but if it’s computationally expensive to derive them on the fly, it might be a good idea to store it anyway.
Anti-pattern 7: Missing database indices
It’s generally known that adding indices to tables helps speed up queries, but nonetheless they’re still a common source of performance issues. The authors speculate that developers don’t have enough expertise to pick the right columns to index, and therefore only add them incrementally whenever a performance issue is reported.
Application design trade-offs
Some performance issues are best solved by modifying the application’s functionality.
Anti-pattern 8: Content display trade-offs
Most scalability problems in the tested applications were caused by controller actions that retrieve all database records for a single page. This can easily be solved using pagination.
Anti-pattern 9: Application functionality trade-offs
Sometimes the cheapest solution is to simply remove functionality altogether.
The authors applied 64 fixes in total and found that most fixes were very effective, achieving speedups of at least 2⨉. About a quarter achieves a speedup of 5⨉, while the largest speedup (39⨉) is achieved by removing functionality.
Most of the fixes are small and can be applied using less than 10 lines of code.