How not to structure your database-backed web applications
By abstracting away the details of querying databases, Object Relational Mapping (ORM) frameworks allow us to build Web applications efficiently. Sadly, those same abstractions can also prevent them from running efficiently. Yang, Subramaniam, Lu, Yan, and Cheung identified 9 performance anti-patterns and ways to mitigate them.
Performance is important: users expect Web pages to load quickly, and are quick to abandon a Web site if it takes too long to load.
Achieving that performance requires understanding of the application and the database management system (DBMS). What makes this hard, is that the latter is typically abstracted away by an ORM framework. Moreover, most performance issues are hard to expose during testing and may only occur with larger datasets (like those on production).
The authors installed 12 popular Ruby on Rails applications from GitHub, and populated each of the applications using of various sizes (200, 2,000 and 20,000 records in the main database table).
Each application is hosted on an AWS m4.xlarge instance and crawled for two hours, while various metrics are written to a logfile. This process is repeated three times for every test set size.
The identified anti-patterns can be grouped into three categories: ORM API misuses, database design, and application design trade-offs.
ORM frameworks offer multiple ways to do things, but developers don’t always pick the most efficient option: about half of all performance issues are caused by incorrect usage of ORM APIs.
Anti-pattern 1: Inefficient computation
The computation done by the code is useful, but retrieves more information than needed. Examples include:
-
Using a
COUNT(*)
to check whether there’s any record that satisfies a condition, when aSELECT 1 FROM ... LIMIT 1
would have sufficed; -
Retrieving records from the database only to count them using application code;
-
Retrieving records from the database when the same information is already present in objects that are already in memory.
Anti-pattern 2: Unnecessary computation
Some performance issues are caused by queries that could have been avoided altogether, for instance when:
-
The same query is executed within each iteration of a for-loop, even though the result stays the same;
-
Queries are repeatedly reissued to update in-memory copies of objects, regardless of whether the objects are (immediately) used;
-
Queries are executed even though the results are already known.
Anti-pattern 3: Inefficient data accessing
Data transfer might be slowed down if:
-
records from referenced (or referencing) tables are rather than loaded eagerly;
-
so much data is eagerly loaded that it creates huge memory pressure;
-
N queries are issued to update N records, when they can also be batched together into a single query.
Anti-pattern 4: Unnecessary data retrieval
Data is retrieved that’s never actually used, e.g. retrieving entire records
when you are only interested in the id
s.
Anti-pattern 5: Inefficient rendering
Each object from a result set is rendered separately using a partial, when it is more efficient to render all objects using string substitution. .
Suboptimal database design can also cause performance issues.
Anti-pattern 6: Missing fields
Information that can be derived from other fields, should normally not be stored in the database – but if it’s computationally expensive to derive them on the fly, it might be a good idea to store it anyway.
Anti-pattern 7: Missing database indices
It’s generally known that adding indices to tables helps speed up queries, but nonetheless they’re still a common source of performance issues. The authors speculate that developers don’t have enough expertise to pick the right columns to index, and therefore only add them incrementally whenever a performance issue is reported.
Some performance issues are best solved by modifying the application’s functionality.
Anti-pattern 8: Content display trade-offs
Most scalability problems in the tested applications were caused by controller actions that retrieve all database records for a single page. This can easily be solved using pagination.
Anti-pattern 9: Application functionality trade-offs
Sometimes the cheapest solution is to simply remove functionality altogether.
The authors applied 64 fixes in total and found that most fixes were very effective, achieving speedups of at least 2⨉. About a quarter achieves a speedup of 5⨉, while the largest speedup (39⨉) is achieved by removing functionality.
Most of the fixes are small and can be applied using less than 10 lines of code.
-
Anti-patterns in ORM API usage, and database and application design can cause performance problems in Web applications
-
Removing anti-patterns can often result in 2⨉ speedups
-
Many databased-related performance issues can be resolved using only a few lines of code