Section C · Patterns

SQL Pattern Menu

The 12 patterns that cover ~80% of analytics-engineer SQL interview questions. Memorize the cues. Drill the implementations in 11-sql-problems.

The 12 patterns

Pattern	Cue word in question	Tool
Latest row per group	"most recent" / "latest"	`ROW_NUMBER() OVER (PARTITION BY... ORDER BY... DESC)`
Top-N per group	"top K per..."	`ROW_NUMBER <= N`
Running total	"cumulative" / "running"	`SUM() OVER (ORDER BY... ROWS UNBOUNDED PRECEDING)`
Moving average	"7-day average" / "rolling"	`AVG() OVER (ROWS BETWEEN N PRECEDING AND CURRENT ROW)`
Period over period	"vs last month" / "MoM"	`LAG()` / `LEAD()`
Gaps and islands	"continuous periods" / "streaks"	Subtract row_number from date, group by result
Sessionization	"sessions" / "events within X minutes"	LAG + cumulative sum of "new session" flag
Dedup with tiebreaker	"keep the most recent" / "unique by"	`ROW_NUMBER` with rich ORDER BY
Cohort retention	"cohort" / "retention by signup month"	Cohort table + activity join + pivot
Funnel analysis	"funnel" / "step-by-step conversion"	`MIN(CASE WHEN event=... THEN ts END)` per user
Pivot / unpivot	"rows to columns" / "wide format"	`SUM(CASE WHEN... THEN... END)`
Self-join for pairs	"compare each X to each Y" / "first vs last"	Self-join on key with inequality, or window functions

Recognition cues — the senior tell

The fastest path to looking senior in SQL interviews is naming the pattern out loud before writing code. The interviewer relaxes; they know you'll get there.

"Most recent" / "latest"

→ ROW_NUMBER pattern, latest-per-group. Always specify a tiebreaker.

"Each customer" / "per group"

→ PARTITION BY. Window function if you want all rows; GROUP BY if you want aggregates.

"Cumulative" / "running" / "rolling"

→ Window aggregate with explicit ROWS BETWEEN frame.

"vs previous" / "MoM" / "WoW"

→ LAG. Don't forget NULLIF on the denominator for percentages.

"Continuous" / "streak" / "consecutive"

→ Gaps-and-islands. Subtract row number from date.

"Sessions" / "events within X minutes"

→ Sessionization: LAG + cumulative-sum island.

"Unique by X" / "dedupe"

→ ROW_NUMBER with multi-key ORDER BY for tiebreaking.

"Cohort retention"

→ Cohort assignment → activity join → optional pivot to wide.

"Funnel" / "conversion rate"

→ MIN of timestamp per event-type per user, then ordering checks.

"Wide format" / "one row per X with columns"

→ Pivot. CASE WHEN sums are portable; PIVOT operator is shorter on supported warehouses.

Per-group calculations — when window vs GROUP BY

GROUP BY: collapse to one row per group. You lose the original rows.
Window function (OVER PARTITION BY): compute per group but keep all original rows.

Common pattern: compute in a window, filter, then aggregate. Example: "average revenue per customer's top 3 orders" — window-rank, filter to rank ≤ 3, then group + avg.

Time-based analysis

Period bucketing: DATE_TRUNC('month', ts) for monthly aggregation.
Filling gaps: LEFT JOIN to dim_date to ensure every day has a row.
Period-over-period: LAG with the right ordering.
Year-over-year: self-join on year = year - 1, or LAG with offset 12 if monthly.
Window over time: cumulative + rolling + same-period-last-year are all just frame variations.

Deduplication

The pattern: ROW_NUMBER over the candidate key with a tiebreaker that picks the "right" duplicate.

dedup pattern

WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY canonical_key
      ORDER BY
        prefer_this DESC,    -- primary preference (e.g. updated_at DESC)
        (col IS NOT NULL) DESC,  -- prefer non-null
        id ASC                -- final tiebreaker — deterministic
    ) AS rn
  FROM source
)
SELECT * FROM ranked WHERE rn = 1;

The final deterministic tiebreaker (a unique column) matters — without it, ties are non-deterministic and re-runs give different "winners."

Sequential / funnel analysis

The "what % of users went from step A to step B to step C" pattern.

Canonical approach: one row per user, MIN of timestamp for each event type, then comparisons:

funnel pattern

WITH user_events AS (
  SELECT user_id,
    MIN(CASE WHEN event = 'A' THEN ts END) AS first_a,
    MIN(CASE WHEN event = 'B' THEN ts END) AS first_b,
    MIN(CASE WHEN event = 'C' THEN ts END) AS first_c
  FROM events GROUP BY user_id
)
SELECT
  COUNT(*) AS step_a,
  COUNT(*) FILTER (WHERE first_b >= first_a) AS step_b,  -- B after A
  COUNT(*) FILTER (WHERE first_c >= first_b) AS step_c   -- C after B
FROM user_events
WHERE first_a IS NOT NULL;

Reshape — pivot & unpivot

CASE WHEN pivot works everywhere; PIVOT/UNPIVOT operators are shorter where supported.

For dynamic pivot (unknown column values at write time), either generate SQL in app code, use dbt_utils.pivot macro, or just return long format and pivot in the BI tool.

Drill these

Pattern recognition is the high-value drill. 11-sql-problems has worked examples of each pattern with multiple approaches. Solve them on a real warehouse, not in your head.