Section C · Patterns

SQL Pattern Menu

The 12 patterns that cover ~80% of analytics-engineer SQL interview questions. Memorize the cues. Drill the implementations in 11-sql-problems.

Recognition cues — the senior tell

The fastest path to looking senior in SQL interviews is naming the pattern out loud before writing code. The interviewer relaxes; they know you'll get there.

"Most recent" / "latest"

→ ROW_NUMBER pattern, latest-per-group. Always specify a tiebreaker.

"Each customer" / "per group"

→ PARTITION BY. Window function if you want all rows; GROUP BY if you want aggregates.

"Cumulative" / "running" / "rolling"

→ Window aggregate with explicit ROWS BETWEEN frame.

"vs previous" / "MoM" / "WoW"

→ LAG. Don't forget NULLIF on the denominator for percentages.

"Continuous" / "streak" / "consecutive"

→ Gaps-and-islands. Subtract row number from date.

"Sessions" / "events within X minutes"

→ Sessionization: LAG + cumulative-sum island.

"Unique by X" / "dedupe"

→ ROW_NUMBER with multi-key ORDER BY for tiebreaking.

"Cohort retention"

→ Cohort assignment → activity join → optional pivot to wide.

"Funnel" / "conversion rate"

→ MIN of timestamp per event-type per user, then ordering checks.

"Wide format" / "one row per X with columns"

→ Pivot. CASE WHEN sums are portable; PIVOT operator is shorter on supported warehouses.

Per-group calculations — when window vs GROUP BY

  • GROUP BY: collapse to one row per group. You lose the original rows.
  • Window function (OVER PARTITION BY): compute per group but keep all original rows.

Common pattern: compute in a window, filter, then aggregate. Example: "average revenue per customer's top 3 orders" — window-rank, filter to rank ≤ 3, then group + avg.

Time-based analysis

  • Period bucketing: DATE_TRUNC('month', ts) for monthly aggregation.
  • Filling gaps: LEFT JOIN to dim_date to ensure every day has a row.
  • Period-over-period: LAG with the right ordering.
  • Year-over-year: self-join on year = year - 1, or LAG with offset 12 if monthly.
  • Window over time: cumulative + rolling + same-period-last-year are all just frame variations.

Deduplication

The pattern: ROW_NUMBER over the candidate key with a tiebreaker that picks the "right" duplicate.

dedup pattern
WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (
      PARTITION BY canonical_key
      ORDER BY
        prefer_this DESC,    -- primary preference (e.g. updated_at DESC)
        (col IS NOT NULL) DESC,  -- prefer non-null
        id ASC                -- final tiebreaker — deterministic
    ) AS rn
  FROM source
)
SELECT * FROM ranked WHERE rn = 1;

The final deterministic tiebreaker (a unique column) matters — without it, ties are non-deterministic and re-runs give different "winners."

Sequential / funnel analysis

The "what % of users went from step A to step B to step C" pattern.

Canonical approach: one row per user, MIN of timestamp for each event type, then comparisons:

funnel pattern
WITH user_events AS (
  SELECT user_id,
    MIN(CASE WHEN event = 'A' THEN ts END) AS first_a,
    MIN(CASE WHEN event = 'B' THEN ts END) AS first_b,
    MIN(CASE WHEN event = 'C' THEN ts END) AS first_c
  FROM events GROUP BY user_id
)
SELECT
  COUNT(*) AS step_a,
  COUNT(*) FILTER (WHERE first_b >= first_a) AS step_b,  -- B after A
  COUNT(*) FILTER (WHERE first_c >= first_b) AS step_c   -- C after B
FROM user_events
WHERE first_a IS NOT NULL;

Reshape — pivot & unpivot

CASE WHEN pivot works everywhere; PIVOT/UNPIVOT operators are shorter where supported.

For dynamic pivot (unknown column values at write time), either generate SQL in app code, use dbt_utils.pivot macro, or just return long format and pivot in the BI tool.

Drill these

Pattern recognition is the high-value drill. 11-sql-problems has worked examples of each pattern with multiple approaches. Solve them on a real warehouse, not in your head.