Every confusion about GROUP BY and HAVING boils down to one thing: they run at different points in the query. Get the execution order right once and nothing about these clauses is tricky again.
Execution order (simplified)
FROMandJOIN: assemble the raw rows.WHERE: filter rows before grouping.GROUP BY: collapse rows into one per group.- Aggregate functions (
COUNT,SUM, ...): computed on each group. HAVING: filter groups based on aggregate results.SELECTprojection: choose columns to return.ORDER BY,LIMIT.
The cardinal rule
Every column in the SELECT list must be either in the GROUP BY or inside an aggregate. Otherwise the database cannot know which row to pick.
-- Broken: sales is not grouped and is not aggregated
SELECT region, sales
FROM orders
GROUP BY region;
-- Fix 1: group by both
SELECT region, sales
FROM orders
GROUP BY region, sales;
-- Fix 2 (usually what you wanted): aggregate sales
SELECT region, SUM(sales) AS total_sales
FROM orders
GROUP BY region;WHERE vs HAVING
Use WHERE to drop individual rows before grouping; use HAVING to drop whole groups based on their aggregate.
-- Regions whose 2026 sales exceed $1M
SELECT region, SUM(amount) AS total
FROM orders
WHERE order_year = 2026 -- per-row filter
GROUP BY region
HAVING SUM(amount) > 1000000 -- per-group filter
ORDER BY total DESC;Filters that do not depend on aggregates belong in WHERE because WHERE runs first and shrinks the dataset the engine has to group. Putting them in HAVING works but is slower.
GROUP BY vs DISTINCT
SELECT DISTINCT col and SELECT col FROM t GROUP BY col return the same rows. Use DISTINCT when you only need uniqueness, and GROUP BY when you also want to aggregate.
COUNT quirks
COUNT(*): every row, including those with NULLs.COUNT(col): rows wherecolis not NULL.COUNT(DISTINCT col): unique non-NULL values.
Grouping by expressions
You can group by anything that is not an aggregate. A common example is bucketing by month:
SELECT
DATE_TRUNC('month', created_at) AS month,
COUNT(*) AS signups
FROM users
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;Some databases let you write GROUP BY 1 to group by the first column in the projection, which is handy in interviews but fragile in production code.
Common mistakes
- Putting an aggregate in
WHERE. Errors out. Aggregates only exist after grouping; useHAVING. - Grouping by a derived column alias. Standard SQL requires you to repeat the expression; PostgreSQL and MySQL allow the alias but others do not.
- Forgetting that
ORDER BYhappens after the projection. Referring to an aggregate by its alias inORDER BYis fine in most databases.
Practice this live
The Exchange chapter introduces COUNT, SUM, and GROUP BY. The Director chapter pushes further into HAVING and complex multi-join aggregations.