Section C · Analytics

Analytics Fundamentals — SQL & Notebook Fluency for PMs

The "coding" round for a Platform PM is rarely LeetCode. It's SQL on a funnel, a cohort, an A/B test, or an instrumentation gap. Plus notebook and BI tool fluency. This chapter covers the patterns you'll be tested on and the discipline that separates "PM who uses SQL" from "PM who can hold their own with a data scientist."

Why a Platform PM needs SQL

You're going to be the one defining metrics for the platform. Definitions live in queries. The PM who hands off "give me the activation rate" without specifying gets back a number that's defensible-to-the-DS but not what they meant. The PM who writes the query gets the truth.

Plus, you'll be asked. Most senior PM loops at fintech / crypto include a SQL round. Sometimes a take-home. Often live.

  • You don't need to be a data engineer.
  • You do need to write a clean funnel, a clean cohort, and a clean A/B test analysis without hand-holding.
  • You should know enough to spot a bad number when one is presented to you.

SQL survival kit

The 80% of SQL that gets PMs through:

  • SELECT + WHERE + GROUP BY + HAVING + ORDER BY — basics.
  • JOIN shapes — INNER, LEFT, FULL OUTER. Know what each does to row counts.
  • CASE WHEN for derived buckets.
  • COUNT, COUNT DISTINCT, SUM, AVG, PERCENTILE_CONT.
  • Date functions — DATE_TRUNC, DATE_DIFF, INTERVAL.
  • CTEs (WITH ... AS) — chain steps readably.
  • Window functions — ROW_NUMBER, LAG, LEAD, SUM OVER.
  • NULL handling — COALESCE, IS NULL, behavior in joins.

The Snowflake / BigQuery / Postgres flavor variations are real but small. Don't memorize three; learn one well and translate.

The funnel-analysis pattern

Onboarding is a funnel. Almost every analytics task in this seat starts as "what's our funnel doing?" The pattern, written cleanly:

-- Onboarding funnel, weekly cohort, by segment
WITH starts AS (
  SELECT
    applicant_id,
    DATE_TRUNC('week', created_at)         AS cohort_week,
    segment,
    region
  FROM onboarding.applicants
  WHERE created_at >= '2026-01-01'
),
events AS (
  SELECT applicant_id, event_type, occurred_at
  FROM onboarding.events
  WHERE event_type IN (
    'id_submitted', 'verified', 'approved', 'first_deposit'
  )
)
SELECT
  s.cohort_week,
  s.segment,
  s.region,
  COUNT(DISTINCT s.applicant_id)                                AS started,
  COUNT(DISTINCT CASE WHEN e.event_type = 'id_submitted'  THEN s.applicant_id END) AS id_submitted,
  COUNT(DISTINCT CASE WHEN e.event_type = 'verified'      THEN s.applicant_id END) AS verified,
  COUNT(DISTINCT CASE WHEN e.event_type = 'approved'      THEN s.applicant_id END) AS approved,
  COUNT(DISTINCT CASE WHEN e.event_type = 'first_deposit' THEN s.applicant_id END) AS activated
FROM starts s
LEFT JOIN events e USING (applicant_id)
GROUP BY 1, 2, 3
ORDER BY 1 DESC, 2, 3;

Notes a senior PM would call out reviewing this:

  • Population: "started" is anyone who created an applicant record after 2026-01-01. That's a choice. Could also be "anyone who confirmed email." Be explicit.
  • Counting distinct applicants avoids double-counting if events fire twice.
  • Cohort by week of start means each cohort has a fixed N, so rates are stable to compare.
  • LEFT JOIN keeps applicants who never produced events — drop-off shows correctly.
  • Cohort window: you may need to filter cohorts where the most recent ones haven't had time to convert. Otherwise "this week's cohort looks bad" is a measurement artifact.

The cohort retention pattern

Cohort retention by entry-path — useful for "do users who came in via Pro convert differently?"

-- D7 activation by acquisition source
WITH cohorts AS (
  SELECT
    applicant_id,
    DATE(created_at)       AS cohort_day,
    acquisition_source     AS source
  FROM onboarding.applicants
  WHERE created_at BETWEEN '2026-01-01' AND CURRENT_DATE - INTERVAL '7 days'
),
activated AS (
  SELECT
    applicant_id,
    MIN(occurred_at)       AS first_activation
  FROM onboarding.events
  WHERE event_type = 'first_deposit'
  GROUP BY applicant_id
)
SELECT
  c.source,
  COUNT(*)                                                         AS cohort_n,
  COUNT(CASE WHEN a.first_activation <= c.cohort_day + INTERVAL '7 days' THEN 1 END) AS activated_7d,
  ROUND(100.0 * COUNT(CASE WHEN a.first_activation <= c.cohort_day + INTERVAL '7 days' THEN 1 END)
        / NULLIF(COUNT(*), 0), 2)                                  AS d7_activation_pct
FROM cohorts c
LEFT JOIN activated a USING (applicant_id)
GROUP BY 1
ORDER BY cohort_n DESC;

Why the WHERE created_at <= CURRENT_DATE - INTERVAL '7 days'? Because if you include cohorts younger than 7 days, they haven't had time to activate. That's a discipline most candidates miss.

Window functions worth knowing

Window functions are where SQL stops being "for loops" and starts being analytical. Three you should know cold:

-- Time-between-events with LAG
SELECT
  applicant_id,
  event_type,
  occurred_at,
  LAG(occurred_at) OVER (PARTITION BY applicant_id ORDER BY occurred_at) AS prev_event_at,
  EXTRACT(EPOCH FROM occurred_at - LAG(occurred_at)
           OVER (PARTITION BY applicant_id ORDER BY occurred_at)) AS seconds_since_prev
FROM onboarding.events;

-- First event per applicant with ROW_NUMBER
WITH ranked AS (
  SELECT *,
    ROW_NUMBER() OVER (PARTITION BY applicant_id ORDER BY occurred_at) AS rn
  FROM onboarding.events
  WHERE event_type = 'verified'
)
SELECT * FROM ranked WHERE rn = 1;

-- Running total with SUM OVER
SELECT
  DATE_TRUNC('day', occurred_at) AS day,
  COUNT(*) AS daily_verifications,
  SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', occurred_at)) AS cumulative
FROM onboarding.events
WHERE event_type = 'verified'
GROUP BY 1;

Reading an A/B test in SQL

Live problem: "We tested vendor B against vendor A. Did B improve approval rate?"

-- Assignment table tells you who got which variant
-- Be careful: only count post-assignment events
WITH assignments AS (
  SELECT applicant_id, variant, assigned_at
  FROM experiments.vendor_ab
  WHERE experiment_id = 'idv_vendor_2026q1'
),
outcomes AS (
  SELECT
    a.applicant_id,
    a.variant,
    MAX(CASE WHEN e.event_type = 'approved' AND e.occurred_at > a.assigned_at THEN 1 ELSE 0 END) AS approved
  FROM assignments a
  LEFT JOIN onboarding.events e USING (applicant_id)
  GROUP BY a.applicant_id, a.variant
)
SELECT
  variant,
  COUNT(*)                                       AS n,
  SUM(approved)                                  AS approvals,
  ROUND(100.0 * SUM(approved) / COUNT(*), 2)     AS approval_rate_pct
FROM outcomes
GROUP BY variant;

The trap most candidates fall into: counting outcomes that happened before assignment, or counting users who saw both variants because the assignment broke. Naming the trap shows seniority.

You won't be expected to compute statistical significance live by hand. You should be able to say "I'd compute a two-proportion z-test or use the bootstrapped CI" and reach for the right tool.

Notebook fluency

Jupyter / Hex / Mode / Deepnote notebooks are the standard PM analysis environment. What you need to be able to do:

  • Open a notebook, connect to the data warehouse, write a query, plot it.
  • Use pandas for slicing — groupby, merge, pivot_table, resample.
  • Use a plotting library — matplotlib or plotly — well enough for an exploratory chart, not production-pretty.
  • Compute basic stats — mean, median, percentiles, scipy.stats for a z-test or t-test.
import pandas as pd
import matplotlib.pyplot as plt

# Funnel data from SQL
df = pd.read_sql(funnel_sql, engine)

# Stage-to-stage conversion
df['id_submit_rate']  = df['id_submitted']  / df['started']
df['verify_rate']     = df['verified']      / df['id_submitted']
df['approve_rate']    = df['approved']      / df['verified']
df['activation_rate'] = df['activated']     / df['approved']

# By cohort plot
pivot = df.pivot_table(index='cohort_week', columns='segment',
                      values='activation_rate', aggfunc='mean')
pivot.plot(figsize=(10, 5), title='D7 activation by cohort & segment')
plt.ylabel('activation rate')
plt.show()

Looker / Mode / Hex craft

What the BI fluency check actually is:

  • Can you read a Looker LookML model and tell whether it's reasonable? (You don't need to write LookML; you do need to recognize what a "view," "explore," and "dimension" are.)
  • Can you build a Mode or Hex report someone else can rerun?
  • Can you spot a metric defined inconsistently across two dashboards?
  • Can you write a dashboard's underlying SQL clean enough that a DS partner can extend it without rewriting?

The PM who pastes a screenshot of a dashboard into a doc is unhelpful. The PM who pastes a screenshot plus the dashboard URL plus the query plus the population definition is the PM who gets trusted.

Population definition discipline

The single most important habit to demonstrate in a PM analytics round. Before any number, you state the population.

Common ambiguityThe discipline
"Activation rate""% of applicants who first-deposit within 7 days of being approved, cohorted by week of approval, excluding the most recent 7 days because they haven't had time to convert."
"Verified users""Applicants in state 'approved' whose most recent re-screening did not raise a hit."
"Onboarding time""Median wall-clock from first applicant event to first 'approved' event, excluding applicants who never reach approved."
"Fraud rate""Count of accounts blocked-for-fraud within 90 days of approval / count of approved accounts, by approval cohort week."

Whenever you say a metric, restate the population. This is the fastest credibility signal in an analytics round.

Don't ship a chart you can't reproduce

The hardest discipline: every chart you share has a query attached. The query is in source control or the dashboard tool, not your local notebook. A reviewer can rerun it and get the same number a week later.

  • Never paste a screenshot without a link.
  • Never quote a number without naming the dashboard or query.
  • Never ship a metric in a PRD without the SQL definition behind it.
  • Version-control your important queries — they're product artifacts.
In the interview

If asked to walk through how you'd measure something, narrate the population first, the metric next, the query third, and the trap-you-would-avoid last. That's the cadence of a senior PM.