Chuniversiteit.nl
“Heap, Heap, Array!”

Storing trees in a database table

Relational databases are very good at storing tabular data, but what if you need to store arbitrarily-sized trees?

Christmas tree in a coffee table
I mean, technically it fits

Many things that can be modelled as a tree. Organisational hierarchies for example, or directory structures, or geographical divisions like this one:

Trees aren’t very complicated data structures. We can easily whip up a Location class that is able to represent both a tree and its elements:

The following code snippets shows how we can use this Location class to create an instance that contains the entire tree from the example above:

This is nice and all, but it would be even nicer if we could also store the contents of $world somewhere.

There are several NoSQL databases that are made specifically to store and query data structures like trees, but sometimes all you have – or want – is a relational database. Maybe you don’t want to maintain a separate NoSQL database. Or maybe you are bound to an ORM framework that only supports relational databases, like MySQL and SQLite.

Fortunately it’s totally possible to store trees in a relational database!

A simple solution

We could simply convert the Location class into an ORM-managed entity that looks roughly like this:

location
idnameparent_idorder
1World0
2North America10
3United States20
4California30
5Berkeley40
6Pasadena41
7Massachusetts21
8Europe11
9France80
10United Kingdom81
11England100
12London110
13Scotland101
14Wales102
15Northern Ireland103

Each record stores a single Location. With the exception of the root node, each Location stores a reference to its parent (parent_id) and its position in $children (order). This table contains all the information we need to recreate the original tree.

While this solution is intuitive, it suffers from several issues:

  • Almost each row contains a foreign key reference to a different row. This is a pretty minor issue, but it does mean that storing trees can be somewhat cumbersome.

  • Trees can only be retrieved recursively. This makes sense, because our Location class is also implemented as a recursive data structure. However, it often means that you need to execute multiple individual queries to retrieve the entire tree. Common table expressions (CTEs) allow you to execute recursive queries, but I haven’t encountered many ORM frameworks yet that support them natively.

Nested sets

There is another method to store trees in a relational database, without the need for foreign keys and recursive queries. The nested set model works by traversing the tree in a systematic way and assigning an incrementing number to each node that it enters or leaves.

The figure below shows what this looks like for the tree that I showed at the beginning of the article:

Result of applying a nested set algorithm to the example tree

The traversal starts at the root node (“World”) and then tries to visit the first child of every node that it can find. Once it encounters a leaf node (“Berkeley”) it tries to move to a sibling (“Pasadena”) or its parent’s sibling (“Massachusetts”). This process is repeated until it finally reaches the root node again.

After computing the “left” and “right” values for each node in a tree, we can simply store these values in a table:

location
idnameleftright
1World130
2North America213
3United States312
4California49
5Berkeley56
6Pasadena78
7Massachusetts1011
8Europe1429
9France1516
10United Kingdom1728
11England1821
12London1920
13Scotland2223
14Wales2425
15Northern Ireland2627

Note that this representation has a few interesting (albeit not always very useful) properties:

  • The left value of the root node is always equal to 1, while its right is always double the number of nodes in the tree.

  • The difference between the left and right values for a node tells you something about how large its subtree is. The smallest nodes (leaves) have no children, which always results in a difference of 1.

  • Each descendant of a node n has a left value that is greater than that of n and a right value that is less than that n. This means that we can simply retrieve the entire “North America” subtree by retrieving all Locations with left >= 2 AND right <= 13.

Unfortunately, this representation also suffers from a few minor issues.

The most notable issue with this representation is that it is not as intuitive as the simple solution that I described above. This should be a one-time problem however: once you have implemented the logic for the nested set model, there is no need to think about it ever again!

Another issue is that this representation makes it harder to figure out a node’s direct descendants (i.e. its children). We can easily fix this by adding another field that stores the level (or depth) of the node, so that you can add a WHERE child.level = node.level + 1 to your queries:

location
idnameleftrightlevel
1World1300
2North America2131
3United States3122
4California493
5Berkeley564
6Pasadena784
7Massachusetts10113
8Europe14291
9France15162
10United Kingdom17282
11England18213
12London19204
13Scotland22233
14Wales24253
15Northern Ireland26273

More about SQL

More about programming