Subqueries and EXISTS: Correlated vs Non-Correlated

A subquery is a query inside another query. Where you put it and how it references the outer query tells the engine how to run it. Three shapes cover most real code: scalar, IN, and EXISTS.

Scalar subqueries

A scalar subquery returns exactly one row with one column. You can drop it anywhere a value is allowed.

SELECT
  name,
  salary,
  (SELECT AVG(salary) FROM employees) AS company_avg
FROM employees;

The inner query runs once, its single value is attached to every row of the outer query. If the subquery returns more than one row, the engine throws an error. Guard with LIMIT 1 or an aggregate when appropriate.

IN

SELECT id, name
FROM users
WHERE id IN (SELECT user_id FROM orders WHERE total > 100);

IN is a readable way to filter by membership in another set. Watch out for NOT IN when the subquery can return NULL: NOT IN (1, NULL) is always false, so the whole outer query returns no rows. Use NOT EXISTS instead.

This is documented behavior, not a bug. The PostgreSQL documentation warns that if “at least one right-hand row yields null, the result of the NOT IN construct will be null, not true”, and a NULL in a WHERE clause filters the row out.

EXISTS and NOT EXISTS

-- Users who have placed at least one order
SELECT id, name
FROM users u
WHERE EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

EXISTS is true if the inner query produces any row, false otherwise. The select list inside is irrelevant; the idiom is SELECT 1. The same docs page defines the semantics exactly: “The subquery is evaluated to determine whether it returns any rows. If it returns at least one row, the result of EXISTS is ‘true’; if the subquery returns no rows, the result of EXISTS is ‘false’.”

-- Users who have NEVER placed an order (NULL-safe)
SELECT id, name
FROM users u
WHERE NOT EXISTS (
  SELECT 1 FROM orders o WHERE o.user_id = u.id
);

Because NOT EXISTS does not depend on three- valued logic, it gives you the answer you expect even when the joined column contains NULLs.

Correlated vs non-correlated

A non-correlated subquery can be evaluated independently of the outer query. The engine usually runs it once and reuses the result.

A correlated subquery references a column from the outer query. Logically it has to run once per outer row, though planners often rewrite it.

-- Correlated: the inner query depends on u.id
SELECT u.id, u.name
FROM users u
WHERE (
  SELECT COUNT(*) FROM orders o WHERE o.user_id = u.id
) > 5;

If this pattern shows up in a hot path, measure it. You can often rewrite as a JOIN with a GROUP BY and get identical results faster.

IN vs EXISTS performance

Modern planners (PostgreSQL, SQL Server, Oracle) usually turn both into the same plan. Older guides claim one is faster than the other; treat those claims as lore and measure in your own database. Per the 2025 Stack Overflow Developer Survey, that database is most likely PostgreSQL (55.6% of respondents), MySQL (40.5%), or SQLite (37.5%), and each ships an EXPLAIN command that shows the plan it actually chose.

Common mistakes

Using NOT IN with a nullable column. Use NOT EXISTS.
Writing a scalar subquery that can return more than one row. Add LIMIT 1 or an aggregate.
Reaching for a correlated subquery when a JOIN or a window function would be clearer.

Practice this live

The Deep chapter in SQL Protocol walks through subquery filtering with IN and NOT IN against worked examples. Interview Mode Floors 2 and 3 regularly test the same subquery patterns. If your subqueries keep turning into joins, brush up with SQL joins explained.

Subqueries and EXISTS, Demystified