Guide · 7 min read

GROUP BY and HAVING, Clearly

Why WHERE runs before GROUP BY, why HAVING runs after, and how to stop getting cryptic 'column not in GROUP BY' errors.

  • Execution order
  • Aggregate functions
  • GROUP BY vs DISTINCT
  • HAVING vs WHERE

Every confusion about GROUP BY and HAVING boils down to one thing: they run at different points in the query. Get the execution order right once and nothing about these clauses is tricky again.

Execution order (simplified)

  1. FROM and JOIN: assemble the raw rows.
  2. WHERE: filter rows before grouping.
  3. GROUP BY: collapse rows into one per group.
  4. Aggregate functions (COUNT, SUM, ...): computed on each group.
  5. HAVING: filter groups based on aggregate results.
  6. SELECT projection: choose columns to return.
  7. ORDER BY, LIMIT.

The cardinal rule

Every column in the SELECT list must be either in the GROUP BY or inside an aggregate. Otherwise the database cannot know which row to pick.

-- Broken: sales is not grouped and is not aggregated
SELECT region, sales
FROM orders
GROUP BY region;

-- Fix 1: group by both
SELECT region, sales
FROM orders
GROUP BY region, sales;

-- Fix 2 (usually what you wanted): aggregate sales
SELECT region, SUM(sales) AS total_sales
FROM orders
GROUP BY region;

WHERE vs HAVING

Use WHERE to drop individual rows before grouping; use HAVING to drop whole groups based on their aggregate.

-- Regions whose 2026 sales exceed $1M
SELECT region, SUM(amount) AS total
FROM orders
WHERE order_year = 2026          -- per-row filter
GROUP BY region
HAVING SUM(amount) > 1000000     -- per-group filter
ORDER BY total DESC;

Filters that do not depend on aggregates belong in WHERE because WHERE runs first and shrinks the dataset the engine has to group. Putting them in HAVING works but is slower.

GROUP BY vs DISTINCT

SELECT DISTINCT col and SELECT col FROM t GROUP BY col return the same rows. Use DISTINCT when you only need uniqueness, and GROUP BY when you also want to aggregate.

COUNT quirks

  • COUNT(*): every row, including those with NULLs.
  • COUNT(col): rows where col is not NULL.
  • COUNT(DISTINCT col): unique non-NULL values.

Grouping by expressions

You can group by anything that is not an aggregate. A common example is bucketing by month:

SELECT
  DATE_TRUNC('month', created_at) AS month,
  COUNT(*) AS signups
FROM users
GROUP BY DATE_TRUNC('month', created_at)
ORDER BY month;

Some databases let you write GROUP BY 1 to group by the first column in the projection, which is handy in interviews but fragile in production code.

Common mistakes

  • Putting an aggregate in WHERE. Errors out. Aggregates only exist after grouping; use HAVING.
  • Grouping by a derived column alias. Standard SQL requires you to repeat the expression; PostgreSQL and MySQL allow the alias but others do not.
  • Forgetting that ORDER BY happens after the projection. Referring to an aggregate by its alias in ORDER BY is fine in most databases.

Practice this live

The Exchange chapter introduces COUNT, SUM, and GROUP BY. The Director chapter pushes further into HAVING and complex multi-join aggregations.

Drill this in the game

These chapters practice exactly what this guide covers:

Ready to write real SQL?

Sign in with Google. Free, no credit card, no installation. Desktop only.

Play SQL Protocol