Subqueries
Subqueries (also known as inner queries or nested queries) are queries within
a query.
Subqueries can be used in SELECT
, FROM
, WHERE
, and HAVING
clauses.
The examples below are based on the following tables.
SELECT * FROM x;
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1 | 2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
SELECT * FROM y;
+--------+--------+
| number | string |
+--------+--------+
| 1 | one |
+--------+--------+
| 2 | two |
+--------+--------+
| 3 | three |
+--------+--------+
| 4 | four |
+--------+--------+
Subquery operators
[ NOT ] EXISTS
The EXISTS
operator returns all rows where a
correlated subquery produces one or more matches for
that row. NOT EXISTS
returns all rows where a correlated subquery produces
zero matches for that row. Only correlated subqueries are supported.
[NOT] EXISTS (subquery)
[ NOT ] IN
The IN
operator returns all rows where a given expression’s value can be found
in the results of a correlated subquery.
NOT IN
returns all rows where a given expression’s value cannot be found in
the results of a subquery or list of values.
expression [NOT] IN (subquery|list-literal)
Examples
SELECT * FROM x WHERE column_1 IN (1,3);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 1 | 2 |
+----------+----------+
SELECT * FROM x WHERE column_1 NOT IN (1,3);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
SELECT clause subqueries
SELECT
clause subqueries use values returned from the inner query as part
of the outer query's SELECT
list.
The SELECT
clause only supports scalar subqueries that
return a single value per execution of the inner query.
The returned value can be unique per row.
SELECT [expression1[, expression2, ..., expressionN],] (<subquery>)
Note: SELECT
clause subqueries can be used as an alternative to JOIN
operations.
Example
SELECT
column_1,
(
SELECT
first_value(string)
FROM
y
WHERE
number = x.column_1
) AS "numeric string"
FROM
x;
+----------+----------------+
| column_1 | numeric string |
+----------+----------------+
| 1 | one |
| 2 | two |
+----------+----------------+
FROM clause subqueries
FROM
clause subqueries return a set of results that is then queried and
operated on by the outer query.
SELECT expression1[, expression2, ..., expressionN] FROM (<subquery>)
Example
The following query returns the average of maximum values per room. The inner query returns the maximum value for each field from each room. The outer query uses the results of the inner query and returns the average maximum value for each field.
SELECT
column_2
FROM
(
SELECT
*
FROM
x
WHERE
column_1 > 1
);
+----------+
| column_2 |
+----------+
| 4 |
+----------+
WHERE clause subqueries
WHERE
clause subqueries compare an expression to the result of the subquery
and return true or false.
Rows that evaluate to false or NULL are filtered from results.
The WHERE
clause supports correlated and non-correlated subqueries
as well as scalar and non-scalar subqueries (depending on the the operator used
in the predicate expression).
SELECT
expression1[, expression2, ..., expressionN]
FROM
<measurement>
WHERE
expression operator (<subquery>)
Note: WHERE
clause subqueries can be used as an alternative to JOIN
operations.
Examples
WHERE
clause with scalar subquery
The following query returns all rows with column_2
values above the average
of all number
values in y
.
SELECT
*
FROM
x
WHERE
column_2 > (
SELECT
AVG(number)
FROM
y
);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
WHERE
clause with non-scalar subquery
Non-scalar subqueries must use the [NOT] IN
or [NOT] EXISTS
operators and
can only return a single column.
The values in the returned column are evaluated as a list.
The following query returns all rows with column_2
values in table x
that
are in the list of numbers with string lengths greater than three from table
y
.
SELECT
*
FROM
x
WHERE
column_2 IN (
SELECT
number
FROM
y
WHERE
length(string) > 3
);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
WHERE
clause with correlated subquery
The following query returns rows with column_2
values from table x
greater
than the average string
value length from table y
.
The subquery in the WHERE
clause uses the column_1
value from the outer
query to return the average string
value length for that specific value.
SELECT
*
FROM
x
WHERE
column_2 > (
SELECT
AVG(length(string))
FROM
y
WHERE
number = x.column_1
);
+----------+----------+
| column_1 | column_2 |
+----------+----------+
| 2 | 4 |
+----------+----------+
HAVING clause subqueries
HAVING
clause subqueries compare an expression that uses aggregate values
returned by aggregate functions in the SELECT
clause to the result of the
subquery and return true or false.
Rows that evaluate to false are filtered from results.
The HAVING
clause supports correlated and non-correlated subqueries
as well as scalar and non-scalar subqueries (depending on the the operator used
in the predicate expression).
SELECT
aggregate_expression1[, aggregate_expression2, ..., aggregate_expressionN]
FROM
<measurement>
WHERE
<conditional_expression>
GROUP BY
column_expression1[, column_expression2, ..., column_expressionN]
HAVING
expression operator (<subquery>)
Examples
The following query calculates the averages of even and odd numbers in table y
and returns the averages that are equal to the maximum value of column_1
in table x
.
HAVING
clause with a scalar subquery
SELECT
AVG(number) AS avg,
(number % 2 = 0) AS even
FROM
y
GROUP BY
even
HAVING
avg = (
SELECT
MAX(column_1)
FROM
x
);
+-------+--------+
| avg | even |
+-------+--------+
| 2 | false |
+-------+--------+
HAVING
clause with a non-scalar subquery
Non-scalar subqueries must use the [NOT] IN
or [NOT] EXISTS
operators and
can only return a single column.
The values in the returned column are evaluated as a list.
The following query calculates the averages of even and odd numbers in table y
and returns the averages that are in column_1
of table x
.
SELECT
AVG(number) AS avg,
(number % 2 = 0) AS even
FROM
y
GROUP BY
even
HAVING
avg IN (
SELECT
column_1
FROM
x
);
+-------+--------+
| avg | even |
+-------+--------+
| 2 | false |
+-------+--------+
Subquery categories
Subqueries can be categorized as one or more of the following based on the behavior of the subquery:
Correlated subqueries
In a correlated subquery, the inner query depends on the values of the current row being processed.
Note: DataFusion internally rewrites correlated subqueries into JOINs to improve performance. In general correlated subqueries are less performant than non-correlated subqueries.
Non-correlated subqueries
In a non-correlated subquery, the inner query doesn't depend on the outer query and executes independently. The inner query executes first, and then passes the results to the outer query.
Scalar subqueries
A scalar subquery returns a single value (one column of one row). If no rows are returned, the subquery returns NULL.
Non-scalar subqueries
A non-scalar subquery returns 0, 1, or multiple rows, each of which may contain 1 or multiple columns. For each column, if there is no value to return, the subquery returns NULL. If no rows qualify to be returned, the subquery returns 0 rows.