A subquery is a query inside another query. Where you put it and how it references the outer query tells the engine how to run it. Three shapes cover most real code: scalar, IN, and EXISTS.
Scalar subqueries
A scalar subquery returns exactly one row with one column. You can drop it anywhere a value is allowed.
SELECT
name,
salary,
(SELECT AVG(salary) FROM employees) AS company_avg
FROM employees;The inner query runs once, its single value is attached to every row of the outer query. If the subquery returns more than one row, the engine throws an error. Guard with LIMIT 1 or an aggregate when appropriate.
IN
SELECT id, name
FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 100);IN is a readable way to filter by membership in another set. Watch out for NOT IN when the subquery can return NULL: NOT IN (1, NULL) is always false, so the whole outer query returns no rows. Use NOT EXISTS instead.
EXISTS and NOT EXISTS
-- Users who have placed at least one order
SELECT id, name
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);EXISTS is true if the inner query produces any row, false otherwise. The select list inside is irrelevant; the idiom is SELECT 1.
-- Users who have NEVER placed an order (NULL-safe)
SELECT id, name
FROM users u
WHERE NOT EXISTS (
SELECT 1 FROM orders o WHERE o.user_id = u.id
);Because NOT EXISTS does not depend on three- valued logic, it gives you the answer you expect even when the joined column contains NULLs.
Correlated vs non-correlated
A non-correlated subquery can be evaluated independently of the outer query. The engine usually runs it once and reuses the result.
A correlated subquery references a column from the outer query. Logically it has to run once per outer row, though planners often rewrite it.
-- Correlated: the inner query depends on u.id
SELECT u.id, u.name
FROM users u
WHERE (
SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id
) > 5;If this pattern shows up in a hot path, measure it. You can often rewrite as a JOIN with a GROUP BY and get identical results faster.
IN vs EXISTS performance
Modern planners (PostgreSQL, SQL Server, Oracle) usually turn both into the same plan. Older guides claim one is faster than the other; treat those claims as lore and measure in your own database.
Common mistakes
- Using
NOT INwith a nullable column. UseNOT EXISTS. - Writing a scalar subquery that can return more than one row. Add
LIMIT 1or an aggregate. - Reaching for a correlated subquery when a JOIN or a window function would be clearer.
Practice this live
The Deep chapter in SQL Protocol walks through scalar subqueries, IN, and EXISTS with worked examples. Interview Mode Floors 2 and 3 regularly test correlated-subquery patterns.