Hierarchical querying #
Oracle provides a CONNECT BY
function to explore a hierarchical tree. This
proprietary functionality has advanced features such as loop detection and
provides pseudo-columns such as depth and path.
Since version 14, many advanced features are available with PostgreSQL. For earlier versions, significant work must be performed to port queries using theses clauses.
CONNECT BY #
Here is a SQL query exploring the emp
table’s hierarchy. The mgr
column in
this table points to the manager of the employee. If it is NULL, then this
employee is on top of the hierarchy (START WITH mgr IS NULL
). The link between
the employee and its manager is built with the CONNECT BY PRIOR empno = mgr
clause, indicating that the mgr
is the empno
identifier of the precedent
level of hierarchy.
SELECT empno, ename, job, mgr
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr
This is converted with the help of a recursive query (WITH RECURSIVE
).
Recursion is initiated in a first query retrieving the lines matching the START WITH
condition of previous query: mgr is NULL
. Recursion continues then with
the following query doing a join between the emp
table and the virtual
emp_hierarchy
table defined by the WITH RECURSIVE
clause. The join condition
matches the previous CONNECT BY
. Here, emp_hierarchy
has been aliased to
prior
, to better illustrate the conversion.
Here is a possible rewriting then:
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr) AS (
SELECT empno, ename, job, mgr
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
)
SELECT * FROM emp_hierarchy;
One should pay attention to the returned order of rows with the WITH RECURSIVE
query. Oracle uses by default a depth-first algorithm to implement CONNECT BY
. It will explore each branch before going to the next. WITH RECURSIVE
is
breadth-first, exploring each level before going onto the next.
It’s possible to get the CONNECT BY
order though, by sorting on a path column,
such as the one one would build to emulate SYS_CONNECT_BY_PATH
:
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, path) AS (
SELECT empno, ename, job, mgr, ARRAY[ename::TEXT] AS path
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr,
prior.path || emp.ename::TEXT AS path
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
)
SELECT empno, ename, job FROM emp_hierarchy AS emp
ORDER BY path;
Oracle 11g has changed the default return order of CONNECT BY
queries.
LEVEL pseudo-column #
One can get the hierarchy depth of an element using the LEVEL
pseudo-column.
SELECT empno, ename, job, mgr, level
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr
Porting LEVEL
is easy. Initialize a column called level
to 1, then increment
it in each new recursive join:
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, level) AS (
SELECT empno, ename, job, mgr, 1 AS level
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr, prior.level + 1
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
)
SELECT * FROM emp_hierarchy;
SYS_CONNECT_BY_PATH function #
SYS_CONNECT_BY_PATH
is used to display the path taken to reach a record, each
element separated by a given character. For instance, the following query
returns all managers of an employee:
SELECT empno, ename, job, mgr, SYS_CONNECT_BY_PATH(ename, '/') AS path
FROM emp
START WITH mgr IS NULL
CONNECT BY PRIOR empno = mgr;
This is also quite easy to convert. One can add a root path ‘/’ in the first query of the recursion, then keep concatenating current path to previous path.
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, path) AS (
SELECT empno, ename, job, mgr, '/' || ename AS path
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr,
prior.path || '/' || emp.ename
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
)
SELECT * FROM emp_hierarchy;
Another way is to use an array to store the path, then convert it to any other
representation (here the same as SYS_CONNECT_BY_PATH(ename, '/')
). This can be
used to detect loops all at once:
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, path) AS (
SELECT empno, ename, job, mgr, ARRAY[ename::TEXT] AS path
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr,
prior.path || emp.ename::TEXT AS path
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
)
SELECT empno, ename, job, array_to_string(path, '/') AS path
FROM emp_hierarchy AS emp;
NOCYCLE clause #
The following Oracle query:
SELECT empno, ename, job, mgr
FROM emp
START WITH mgr IS NULL
CONNECT BY NOCYCLE PRIOR empno = mgr;
will be converted to PostgreSQL this way:
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, path, is_cycle) AS (
SELECT empno, ename, job, mgr, ARRAY[ename::TEXT] AS path, false AS is_cycle
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr,
prior.path || emp.ename::TEXT AS path,
emp.ename = ANY(prior.path) AS is_cycle
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
WHERE is_cycle = false
)
SELECT empno, ename, job, mgr
FROM emp_hierarchy AS emp
WHERE is_cycle = false;
CONNECT_BY_IS_CYCLE clause #
The CONNECT_BY_IS_CYCLE
clause returns 1 if the current record has a child that
is also its ancestor. Else it will return 0. This can be emulated from the
previous query, with a conditional expression, and by removing the last filter:
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, level, path, is_cycle) AS (
SELECT empno, ename, job, mgr, 1 AS level,
ARRAY[ename::TEXT] AS path, false AS is_cycle
FROM emp
WHERE mgr = 10000
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr,
prior.level + 1 AS level,
prior.path || emp.ename::TEXT AS path,
emp.ename = ANY(prior.path) AS is_cycle
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
WHERE is_cycle = false
)
SELECT *, is_cycle::int connect_by_is_cycle
FROM emp_hierarchy AS emp;
ORDER SIBLINGS BY clause #
The ORDER SIBLINGS BY
clause can be emulated by starting from the recursive
query used for SYS_CONNECT_BY_PATH
, and sort on the path
array column, as
arrays are sorted by comparing elements from first to last:
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, path) AS (
SELECT empno, ename, job, mgr, '/' || ename AS path
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr, prior.path || '/' || emp.ename
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
)
SELECT * FROM emp_hierarchy
ORDER BY path;
However, this emulation only works with an ascending sort. Applying a descending sort does not return the expected result.
CONNECT_BY_ROOT clause #
The CONNECT_BY_ROOT
clause returns the root of the hierarchy for each element.
In the following example, the last column will return the topmost person in the
hierarchy of the current employee:
SELECT empno, ename, job, mgr AS direct_mgr,
CONNECT_BY_ROOT ename AS mgr
FROM emp
START WITH mgr IS NULL
CONNECT BY mgr = PRIOR empno
ORDER SIBLINGS BY ename DESC;
This is converted the same way as the SYS_CONNECT_BY_PATH
query. The path
array can simply be used to get the topmost element.
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, path) AS (
SELECT empno, ename, job, mgr, ARRAY[ename::TEXT] AS path
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr,
prior.path || emp.ename::TEXT AS path
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
)
SELECT empno, ename, job, path[1] AS connect_by_root
FROM emp_hierarchy AS emp;
CONNECT_BY_ISLEAF clause #
CONNECT_BY_ISLEAF
takes no arguments. It specifies whether current row (called
leaf) is no longer connected to a deeper row through the hierarchy tree. The
value 0
is returned if deeper rows exist and 1
if current row is the last in
the hierarchy.
SELECT empno, ename, job, mgr,
CONNECT_BY_ROOT ename AS mgr,
CONNECT_BY_ISLEAF ASisleaf,
SYS_CONNECT_BY_PATH(ename, '/') AS path
FROM emp
START WITH mgr IS NULL
CONNECT BY mgr = PRIOR empno
CONNECT_BY_ISLEAF
is a bit harder to emulate, as we are using a
breadth-first algorithm: we can’t know if a record is a leaf before having
done the next iteration of the recursive join. We’ll have to identify leaf
records with a second pass.
For instance, we can use a window function to compare each found path to the
following (sorted by path
): if the next path is longer than the current, we
are not a leaf. Else, we are a leaf. Here is a rewrite doing this:
WITH RECURSIVE emp_hierarchy (empno, ename, job, mgr, path) AS (
SELECT empno, ename, job, mgr, ARRAY[ename::TEXT] AS path
FROM emp
WHERE mgr IS NULL
UNION ALL
SELECT emp.empno, emp.ename, emp.job, emp.mgr,
prior.path || emp.ename::TEXT AS path
FROM emp
JOIN emp_hierarchy prior ON (emp.mgr = prior.empno)
)
SELECT emp.empno, emp.ename, emp.job,
CASE WHEN leaf.empno IS NULL THEN 1 ELSE 0 END AS isleaf
FROM emp_hierarchy AS emp
LEFT JOIN emp_hierarchy AS leaf ON (emp.empno = leaf.mgr)
ORDER BY emp.path;