Chuniversiteit logomarkChuniversiteit.nl
“Heap, Heap, Array!”

When and how to JOIN a table with itself

Relational databases make it very easy to join data from different tables, but did you know you can also JOIN data within the same table?

Pam Beesly, in the “They’re the same picture” scene
“They’re the same table.”

Relational databases make it easy to JOIN (combine) data from different tables. However, you can also JOIN a table with itself. This is especially handy when a table already contains all the data you need, but not in the right format.

References to the same table

Link

One of the most common reasons to JOIN a table with itself, is because you have a table that contains a foreign key reference to itself.

In the example below we have an employees table with employees. The table includes a manager_id column that refers to an employee in that same employees table.

employees
id name manager_id
1 Denholm
2 Richmond
3 Roy 5
4 Maurice 5
5 Jen 1

If we want to create an overview that lists the name of each employee along with the name of their manager, we need to join the employees table with itself.

When we do this, we need to assign to one (or both) of them so that the database can distinguish between the two instances. It doesn’t really matter what names you use, as long as they’re unique and make sense to you.

Since we have two tables with identical column names, the database no longer understands what you mean when you say you want something like a name. You therefore have to explicitly tell it that you want the name of some row in the employees or the managers version of the table. This can be done by prepending column names with a table alias, followed by a .:

This query gives us the following result:

Computing durations between changes

Link

Another common reason to use self joins is to calculate some duration or distance between pairs of rows within a table.

For example, the order_state table below keeps track of state changes that happen to orders at a webshop. Each record in this table includes a reference to the order, the name of the new state, and shows when the order changed to its new state. Can we calculate how long each of these state transitions took?

order_state
id order_id state created_at
1 1 placed 2021-06-01 00:01:00
2 1 packaged 2021-06-01 00:58:00
3 1 despatched 2021-06-01 02:20:00
4 1 delivered 2021-06-01 15:20:00
5 2 placed 2021-06-02 14:10:00
6 2 packaged 2021-06-02 18:55:00
7 3 placed 2021-06-02 19:00:00
8 3 packaged 2021-06-02 20:30:00
9 3 despatched 2021-06-03 01:40:00
10 2 despatched 2021-06-03 01:40:00
11 2 delivered 2021-06-03 08:00:00
12 3 delivered 2021-06-03 16:10:00

Of course we can! The query below shows how.

Once again, we have one table, order_state, that we want to JOIN with itself. We name one of these from_state and the other to_state, to represent the collection of states before and after the transition.

Obviously, the two states involved in the transition should belong to the same order, so we add the condition that from_state.order_id must be equal to to_state.order_id. Moreover, the to_state should occur later than the from_state, so we’ll also add a condition that the .

Finally, we want to make sure that we only see direct state transitions, like those from placed to packages, and not indirect ones, like those from placed to delivered. This is done by first grouping by from_state.id, which ensures that the “from” state appears only once in the result, and then JOINING each from_state with the closest to_state record that exists, i.e. the record with the lowest to_state.id.

Creating pairs of data

Link

Self joins can also be used to easily create large or unique pairs of data from a relatively small set of data.

Cartesian product

Link

You can to create a result set that includes every possible combination of values within a single table, for instance when you want to know how many possible configurations exist for a product or when you wish to discover which combinations are common or rare.

In the example below, we combine multiple instances of a digit table to form larger numbers.

digit
id
0
1
2
3
4
5
6
7
8
9

I once used a very similar method to generate lists of dates in a legacy business intelligence application that was built on top of a MySQL database:

Unique pairs

Link

Sometimes you’re only interested in unique combinations. For example when you need to create pairs of people or want to list : it doesn’t matter which value is listed first or second – what matters is that the first and second values are listed together exactly once.

The example below shows a simple person table that contains the names of six people.

person
id name
1 Rachel
2 Monica
3 Phoebe
4 Joey
5 Chandler
6 Ross

We can create a list of possible pairings by adding a simple condition to the JOIN clause:

Ranking results

Link

Self-JOINs can also be used to compute ranks of query results, although nowadays there are much better ways to do this.