Analytics Fundamentals — SQL & Notebook Fluency for PMs
The "coding" round for a Platform PM is rarely LeetCode. It's SQL on a funnel, a cohort, an A/B test, or an instrumentation gap. Plus notebook and BI tool fluency. This chapter covers the patterns you'll be tested on and the discipline that separates "PM who uses SQL" from "PM who can hold their own with a data scientist."
Why a Platform PM needs SQL
You're going to be the one defining metrics for the platform. Definitions live in queries. The PM who hands off "give me the activation rate" without specifying gets back a number that's defensible-to-the-DS but not what they meant. The PM who writes the query gets the truth.
Plus, you'll be asked. Most senior PM loops at fintech / crypto include a SQL round. Sometimes a take-home. Often live.
- You don't need to be a data engineer.
- You do need to write a clean funnel, a clean cohort, and a clean A/B test analysis without hand-holding.
- You should know enough to spot a bad number when one is presented to you.
SQL survival kit
The 80% of SQL that gets PMs through:
SELECT+WHERE+GROUP BY+HAVING+ORDER BY— basics.JOINshapes —INNER,LEFT,FULL OUTER. Know what each does to row counts.CASE WHENfor derived buckets.COUNT,COUNT DISTINCT,SUM,AVG,PERCENTILE_CONT.- Date functions —
DATE_TRUNC,DATE_DIFF,INTERVAL. - CTEs (
WITH ... AS) — chain steps readably. - Window functions —
ROW_NUMBER,LAG,LEAD,SUM OVER. NULLhandling —COALESCE,IS NULL, behavior in joins.
The Snowflake / BigQuery / Postgres flavor variations are real but small. Don't memorize three; learn one well and translate.
The funnel-analysis pattern
Onboarding is a funnel. Almost every analytics task in this seat starts as "what's our funnel doing?" The pattern, written cleanly:
-- Onboarding funnel, weekly cohort, by segment
WITH starts AS (
SELECT
applicant_id,
DATE_TRUNC('week', created_at) AS cohort_week,
segment,
region
FROM onboarding.applicants
WHERE created_at >= '2026-01-01'
),
events AS (
SELECT applicant_id, event_type, occurred_at
FROM onboarding.events
WHERE event_type IN (
'id_submitted', 'verified', 'approved', 'first_deposit'
)
)
SELECT
s.cohort_week,
s.segment,
s.region,
COUNT(DISTINCT s.applicant_id) AS started,
COUNT(DISTINCT CASE WHEN e.event_type = 'id_submitted' THEN s.applicant_id END) AS id_submitted,
COUNT(DISTINCT CASE WHEN e.event_type = 'verified' THEN s.applicant_id END) AS verified,
COUNT(DISTINCT CASE WHEN e.event_type = 'approved' THEN s.applicant_id END) AS approved,
COUNT(DISTINCT CASE WHEN e.event_type = 'first_deposit' THEN s.applicant_id END) AS activated
FROM starts s
LEFT JOIN events e USING (applicant_id)
GROUP BY 1, 2, 3
ORDER BY 1 DESC, 2, 3;
Notes a senior PM would call out reviewing this:
- Population: "started" is anyone who created an applicant record after 2026-01-01. That's a choice. Could also be "anyone who confirmed email." Be explicit.
- Counting distinct applicants avoids double-counting if events fire twice.
- Cohort by week of start means each cohort has a fixed N, so rates are stable to compare.
- LEFT JOIN keeps applicants who never produced events — drop-off shows correctly.
- Cohort window: you may need to filter cohorts where the most recent ones haven't had time to convert. Otherwise "this week's cohort looks bad" is a measurement artifact.
The cohort retention pattern
Cohort retention by entry-path — useful for "do users who came in via Pro convert differently?"
-- D7 activation by acquisition source
WITH cohorts AS (
SELECT
applicant_id,
DATE(created_at) AS cohort_day,
acquisition_source AS source
FROM onboarding.applicants
WHERE created_at BETWEEN '2026-01-01' AND CURRENT_DATE - INTERVAL '7 days'
),
activated AS (
SELECT
applicant_id,
MIN(occurred_at) AS first_activation
FROM onboarding.events
WHERE event_type = 'first_deposit'
GROUP BY applicant_id
)
SELECT
c.source,
COUNT(*) AS cohort_n,
COUNT(CASE WHEN a.first_activation <= c.cohort_day + INTERVAL '7 days' THEN 1 END) AS activated_7d,
ROUND(100.0 * COUNT(CASE WHEN a.first_activation <= c.cohort_day + INTERVAL '7 days' THEN 1 END)
/ NULLIF(COUNT(*), 0), 2) AS d7_activation_pct
FROM cohorts c
LEFT JOIN activated a USING (applicant_id)
GROUP BY 1
ORDER BY cohort_n DESC;
Why the WHERE created_at <= CURRENT_DATE - INTERVAL '7 days'? Because if you include cohorts younger than 7 days, they haven't had time to activate. That's a discipline most candidates miss.
Window functions worth knowing
Window functions are where SQL stops being "for loops" and starts being analytical. Three you should know cold:
-- Time-between-events with LAG
SELECT
applicant_id,
event_type,
occurred_at,
LAG(occurred_at) OVER (PARTITION BY applicant_id ORDER BY occurred_at) AS prev_event_at,
EXTRACT(EPOCH FROM occurred_at - LAG(occurred_at)
OVER (PARTITION BY applicant_id ORDER BY occurred_at)) AS seconds_since_prev
FROM onboarding.events;
-- First event per applicant with ROW_NUMBER
WITH ranked AS (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY applicant_id ORDER BY occurred_at) AS rn
FROM onboarding.events
WHERE event_type = 'verified'
)
SELECT * FROM ranked WHERE rn = 1;
-- Running total with SUM OVER
SELECT
DATE_TRUNC('day', occurred_at) AS day,
COUNT(*) AS daily_verifications,
SUM(COUNT(*)) OVER (ORDER BY DATE_TRUNC('day', occurred_at)) AS cumulative
FROM onboarding.events
WHERE event_type = 'verified'
GROUP BY 1;
Reading an A/B test in SQL
Live problem: "We tested vendor B against vendor A. Did B improve approval rate?"
-- Assignment table tells you who got which variant
-- Be careful: only count post-assignment events
WITH assignments AS (
SELECT applicant_id, variant, assigned_at
FROM experiments.vendor_ab
WHERE experiment_id = 'idv_vendor_2026q1'
),
outcomes AS (
SELECT
a.applicant_id,
a.variant,
MAX(CASE WHEN e.event_type = 'approved' AND e.occurred_at > a.assigned_at THEN 1 ELSE 0 END) AS approved
FROM assignments a
LEFT JOIN onboarding.events e USING (applicant_id)
GROUP BY a.applicant_id, a.variant
)
SELECT
variant,
COUNT(*) AS n,
SUM(approved) AS approvals,
ROUND(100.0 * SUM(approved) / COUNT(*), 2) AS approval_rate_pct
FROM outcomes
GROUP BY variant;
The trap most candidates fall into: counting outcomes that happened before assignment, or counting users who saw both variants because the assignment broke. Naming the trap shows seniority.
You won't be expected to compute statistical significance live by hand. You should be able to say "I'd compute a two-proportion z-test or use the bootstrapped CI" and reach for the right tool.
Notebook fluency
Jupyter / Hex / Mode / Deepnote notebooks are the standard PM analysis environment. What you need to be able to do:
- Open a notebook, connect to the data warehouse, write a query, plot it.
- Use
pandasfor slicing —groupby,merge,pivot_table,resample. - Use a plotting library —
matplotliborplotly— well enough for an exploratory chart, not production-pretty. - Compute basic stats — mean, median, percentiles,
scipy.statsfor a z-test or t-test.
import pandas as pd
import matplotlib.pyplot as plt
# Funnel data from SQL
df = pd.read_sql(funnel_sql, engine)
# Stage-to-stage conversion
df['id_submit_rate'] = df['id_submitted'] / df['started']
df['verify_rate'] = df['verified'] / df['id_submitted']
df['approve_rate'] = df['approved'] / df['verified']
df['activation_rate'] = df['activated'] / df['approved']
# By cohort plot
pivot = df.pivot_table(index='cohort_week', columns='segment',
values='activation_rate', aggfunc='mean')
pivot.plot(figsize=(10, 5), title='D7 activation by cohort & segment')
plt.ylabel('activation rate')
plt.show()
Looker / Mode / Hex craft
What the BI fluency check actually is:
- Can you read a Looker LookML model and tell whether it's reasonable? (You don't need to write LookML; you do need to recognize what a "view," "explore," and "dimension" are.)
- Can you build a Mode or Hex report someone else can rerun?
- Can you spot a metric defined inconsistently across two dashboards?
- Can you write a dashboard's underlying SQL clean enough that a DS partner can extend it without rewriting?
The PM who pastes a screenshot of a dashboard into a doc is unhelpful. The PM who pastes a screenshot plus the dashboard URL plus the query plus the population definition is the PM who gets trusted.
Population definition discipline
The single most important habit to demonstrate in a PM analytics round. Before any number, you state the population.
| Common ambiguity | The discipline |
|---|---|
| "Activation rate" | "% of applicants who first-deposit within 7 days of being approved, cohorted by week of approval, excluding the most recent 7 days because they haven't had time to convert." |
| "Verified users" | "Applicants in state 'approved' whose most recent re-screening did not raise a hit." |
| "Onboarding time" | "Median wall-clock from first applicant event to first 'approved' event, excluding applicants who never reach approved." |
| "Fraud rate" | "Count of accounts blocked-for-fraud within 90 days of approval / count of approved accounts, by approval cohort week." |
Whenever you say a metric, restate the population. This is the fastest credibility signal in an analytics round.
Don't ship a chart you can't reproduce
The hardest discipline: every chart you share has a query attached. The query is in source control or the dashboard tool, not your local notebook. A reviewer can rerun it and get the same number a week later.
- Never paste a screenshot without a link.
- Never quote a number without naming the dashboard or query.
- Never ship a metric in a PRD without the SQL definition behind it.
- Version-control your important queries — they're product artifacts.
If asked to walk through how you'd measure something, narrate the population first, the metric next, the query third, and the trap-you-would-avoid last. That's the cadence of a senior PM.