Untangling the web of parent-child relationships across multiple hierarchical levels can be challenging, yet it’s crucial for insightful data analysis. Frequently, we need to identify the apex of these hierarchies, the ‘ultimate parent’, in order to group data for analysis. However, the unpredictable number of levels within these hierarchies can complicate this task. In this post, I will show two queries that can help you accurately pinpoint the ultimate parent, regardless of the complexity of the hierarchical relationships.
Parent-Child Table Example
To get us started, let’s look at a simple example. Here’s a table showing parent-child relationships in a typical SQL hierarchy with just two columns:
child_id | parent_id |
---|---|
1 | NULL |
2 | 1 |
3 | 1 |
4 | 2 |
5 | 2 |
6 | 3 |
7 | 3 |
8 | 4 |
Advanced Approach: Using Recursive Queries to Trace Hierarchies
If available, a recursive query is an efficient way to get to the ultimate parent. A recursive query will keep querying data until we meet a certain condition. In our case, that condition is reaching the ultimate parent.
WITH RECURSIVE ultimate_parent AS ( -- find the ultimate parent SELECT child_id, parent_id, child_id AS Ultimateparent_id FROM account WHERE parent_id IS NULL -- Assuming the top-level parent has NULL parent_id UNION ALL SELECT s.child_id, s.parent_id, r.Ultimateparent_id FROM account AS s JOIN ultimate_parent AS r ON s.parent_id = r.child_id ) SELECT child_id, parent_id, Ultimateparent_id FROM ultimate_parent
Alternative Approach: Navigating Hierarchies with Self-Joins
Using a recursive query isn’t an option, or you might be looking for a more straightforward, less advanced method. This alternative approach involves repeatedly joining the same table on itself – a technique known as self-join. Let’s take a look at this less complex, but equally effective method:
SELECT t1.child_id, t1.parent_id, COALESCE(t10.parent_id,t9.parent_id,t8.parent_id,t7.parent_id,t6.parent_id,t5.parent_id, t4.parent_id, t3.parent_id, t2.parent_id, t1.parent_id, t1.child_id) AS ultimate_parent_id FROM account t1 LEFT JOIN account t2 ON t1.parent_id = t2.child_id LEFT JOIN account t3 ON t2.parent_id = t3.child_id LEFT JOIN account t4 ON t3.parent_id = t4.child_id LEFT JOIN account t5 ON t4.parent_id = t5.child_id LEFT JOIN account t6 ON t5.parent_id = t6.child_id LEFT JOIN account t7 ON t6.parent_id = t7.child_id LEFT JOIN account t8 ON t7.parent_id = t8.child_id LEFT JOIN account t9 ON t8.parent_id = t9.child_id LEFT JOIN account t10 ON t9.parent_id = t10.child_id
In this query, we start with our main table, account
, aliased as t1
. We then use a series of LEFT JOIN
operations to join account
back onto itself multiple times, each time going up one level in the parent-child hierarchy. We do this up to 10 times (t10
) to cater to hierarchies up to 10 levels deep.
The COALESCE
function then steps through the parent_id
s from the highest level (t10
) down to the first level (t1
). The function returns the first non-null parent_id
it encounters, which is the child_id
of the ultimate parent.
The beauty of this query is that it can work in many SQL systems that might not support recursive queries. However, it does have a limitation: it can only traverse as many levels as the number of joins you include in your query. In our case, that’s 10 levels. If you have a deeper hierarchy, you’d need to add more joins.
This might not be as elegant or flexible as the recursive method, but it’s a solid approach that gets the job done when you need a less advanced solution.
Final Thoughts
Check out more Python tricks in this Colab Notebook or in my recent Python Posts.
Thanks for reading!