When and how to JOIN a table with itself

Relational databases make it easy to JOIN (combine) data from different tables. However, you can also JOIN a table with itself. This is especially handy when a table already contains all the data you need, but not in the right format.
One of the most common reasons to JOIN a table with itself, is because you have a table that contains a foreign key reference to itself.
In the example below we have an employees
table with employees. The table includes
a manager_id
column that refers to an employee in that same employees
table.
employees | ||
---|---|---|
id | name | manager_id |
1 | Denholm | |
2 | Richmond | |
3 | Roy | 5 |
4 | Maurice | 5 |
5 | Jen | 1 |
If we want to create an overview that lists the name
of each employee along
with the name
of their manager, we need to join the employees
table with
itself.
When we do this, we need to assign to one (or both) of them so that the database can distinguish between the two instances. It doesn’t really matter what names you use, as long as they’re unique and make sense to you.
Since we have two tables with identical column names, the database no longer
understands what you mean when you say you want something like a name
. You
therefore have to explicitly tell it that you want the name
of some row in the
employees
or the managers
version of the table. This can be done by
prepending column names with a table alias, followed by a .
:
This query gives us the following result:
Another common reason to use self joins is to calculate some duration or distance between pairs of rows within a table.
For example, the order_state
table below keeps track of state changes that
happen to orders at a webshop. Each record in this table includes a reference to
the order, the name of the new state, and shows when the order changed to its
new state. Can we calculate how long each of these state transitions took?
order_state | |||
---|---|---|---|
id | order_id | state | created_at |
1 | 1 | placed | 2021-06-01 00:01:00 |
2 | 1 | packaged | 2021-06-01 00:58:00 |
3 | 1 | despatched | 2021-06-01 02:20:00 |
4 | 1 | delivered | 2021-06-01 15:20:00 |
5 | 2 | placed | 2021-06-02 14:10:00 |
6 | 2 | packaged | 2021-06-02 18:55:00 |
7 | 3 | placed | 2021-06-02 19:00:00 |
8 | 3 | packaged | 2021-06-02 20:30:00 |
9 | 3 | despatched | 2021-06-03 01:40:00 |
10 | 2 | despatched | 2021-06-03 01:40:00 |
11 | 2 | delivered | 2021-06-03 08:00:00 |
12 | 3 | delivered | 2021-06-03 16:10:00 |
Of course we can! The query below shows how.
Once again, we have one table, order_state
, that we want to JOIN with itself.
We name one of these from_state
and the other to_state
, to represent the
collection of states before and after the transition.
Obviously, the two states involved in the transition should belong to the same
order, so we add the condition that from_state.order_id
must be equal to
to_state.order_id
. Moreover, the to_state
should occur later than the
from_state
, so we’ll also add a condition that the .
Finally, we want to make sure that we only see direct state transitions, like
those from placed
to packages
, and not indirect ones, like those from
placed
to delivered
. This is done by first grouping by from_state.id
, which
ensures that the “from” state appears only once in the result, and then JOINING
each from_state
with the closest to_state
record that exists, i.e. the
record with the lowest to_state.id
.
Self joins can also be used to easily create large or unique pairs of data from a relatively small set of data.
You can to create a result set that includes every possible combination of values within a single table, for instance when you want to know how many possible configurations exist for a product or when you wish to discover which combinations are common or rare.
In the example below, we combine multiple instances of a digit
table to form
larger numbers.
digit |
---|
id |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
I once used a very similar method to generate lists of dates in a legacy business intelligence application that was built on top of a MySQL database:
Sometimes you’re only interested in unique combinations. For example when you need to create pairs of people or want to list : it doesn’t matter which value is listed first or second – what matters is that the first and second values are listed together exactly once.
The example below shows a simple person
table that contains the names of six
people.
person | |
---|---|
id | name |
1 | Rachel |
2 | Monica |
3 | Phoebe |
4 | Joey |
5 | Chandler |
6 | Ross |
We can create a list of possible pairings by adding a simple condition to the JOIN clause:
Self-JOINs can also be used to compute ranks of query results, although nowadays there are much better ways to do this.