Section A · Orient · Read first

Start Here

Interview prep for Data Scientist loops at AI companies — calibrated against two specific roles where the work is product analytics, experimentation, and (for the senior role) analytics leadership.

The two roles, in plain English

This guide is calibrated against two open Data Scientist roles that look superficially different but actually share a spine:

Founding product Data Scientist at an AI SaaS company (mid-level, ~3+ years). Define the experimentation stack, build the BI layer, instrument the product, run A/B tests, and set the cultural bar for data-informed decisions.
Lead Data Scientist on an analytics & insights team at a frontier enterprise-AI company (senior IC + manager). Design experimentation programs (A/B, multi-armed bandits, causal inference), build predictive models for forecasting/segmentation/propensity/opportunity sizing, manage analysts, partner across product/research/sales/finance to shape GTM strategy.

The founding-DS role is "be the first DS, build the foundation." The analytics-leadership role is "lead the analytics team at a frontier AI company." Different scopes, same spine: SQL fluency, rigorous experimentation, predictive modeling for business decisions, and translating ambiguous business questions into rigorous analytical recommendations.

The convergence

Both JDs emphasize ambiguity, ownership, and stakeholder communication far more than ML novelty. Some JDs literally say "make smart people faster"; others say "lay the foundation." Neither asks for fancy models — both ask for the analytical lifecycle from question to production-grade answer.

What the rounds typically test

Loops for product/analytics DS roles at AI companies usually include:

SQL screen — one or two timed SQL problems. Window functions, cohorts, funnels, gaps-and-islands. Often the gatekeeper round.
Experimentation / stats — design an A/B test from a fuzzy product prompt; defend your power calculation; reason about peeking, p-hacking, network effects. At the senior-IC level, expect causal-inference questions (DiD, propensity, IV).
Product sense / metrics — "metric X dropped 5% — what's going on?", "how would you measure success for feature Y?", "what's the right north star for this product?"
Predictive modeling — design a forecasting / segmentation / propensity model from a business question. Less "what loss function" and more "how do you frame this, how do you validate, how do you ship it."
Coding — Python data manipulation (pandas / dicts / lists), light algorithms. Less LeetCode-heavy than software roles but you still need to be fluent.
Behavioral / leadership — at the enterprise-AI company especially: a "tell me about leading a team through ambiguity" round is essentially guaranteed.

This guide covers all six.

The folder, in reading order

The numbering follows the order you should read in. Five sections:

Section A — Orient (read first)

File	Why
01-the-roles	Decode what each role actually involves — founding-DS and analytics-leadership side by side
02-positioning-from-scratch	Mindset before content — how to interview honestly when your background doesn't perfectly match the JD

Section B — Core DS concepts (the technical core)

File	Why
03-sql-for-product-analytics	Window functions, cohorts, funnels, retention, the SQL gotchas every loop tests
04-experimentation-foundations	A/B testing math, sample size, peeking, designing experiments from product questions
05-advanced-experimentation	MABs, sequential testing, CUPED, switchback, network effects, ratio metrics
06-causal-inference	DiD, IV, propensity, RDD, synthetic controls — named as a Lead-DS expectation
07-product-metrics	North star, funnels, retention, leading vs lagging, guardrails
08-predictive-modeling-business	Forecasting, segmentation, propensity, opportunity sizing — for business decisions, not ML for its own sake
09-analytics-leadership	Leading and growing an analytics team — required for analytics-leadership, skim if founding-DS focused

Section C — Coding (DSA)

File	Why
10-coding-fundamentals	Python patterns DS interviewers reach for
11-coding-problems	Drillable Python + SQL problems with multiple approaches

Section D — Production / Cloud

File	Why
12-data-stack-design	The "lay the foundation" prompt — defining warehouse / event tracking / experimentation stack at an early-stage company. founding-DS-flavored.
13-dashboards-and-bi	Self-serve BI, metric layers, dashboards that get used vs ignored

Section E — Reference & Execution

File	Why
14-llm-domain-context	Vocabulary you need to be credible at an AI-native company
15-interview-questions	~30 Q&A drill set with hide-show answers
16-day-of	Structural moves, recovery patterns, closing statement. Reread morning of.

Study schedule

Three calibrations depending on time available:

If you have 7+ days

Day 1: 01, 02 (orient) → 03 (SQL — the universal screen)
Day 2: 04, 05 (experimentation)
Day 3: 06 (causal), 07 (metrics)
Day 4: 08 (predictive modeling), 09 (leadership — for analytics-leadership)
Day 5: 10, 11 (coding + SQL drills on a timer)
Day 6: 12, 13 (stack & BI), 14 (LLM context)
Day 7: Drill 15. Reread 16. Sleep.

If you have 2–3 days

01, 02, 03 (SQL), 04 (experimentation foundations), 06 (causal — at least pass through propensity + DiD), 07 (metrics), 11 (drill 4–5 problems), 15 (drill), 16. Skim everything else.

If you have < 24 hours

01, 02, the 03 SQL gotchas section only, 04 (just the design-an-experiment recipe), 07, 15 (drill all questions), 16. Skim 06 headings only.

The single most important reframe

Read this twice

Both JDs are testing your ability to turn ambiguous business questions into rigorous analytical recommendations. They're not testing your knowledge of every loss function or every Bayesian variant. They want someone who can sit with a half-formed product question, frame it, design an analysis or experiment, run it cleanly, and write the recommendation that gets acted on.

Many leadership JDs say this almost verbatim: "A proven ability to turn ambiguous business questions into rigorous analytical problems, with clear and compelling recommendations to match." Others say it as "translate complex analyses into clear, visual narratives that align stakeholders." Same skill, different wording.

What this means tactically:

When you get a vague prompt ("metric X dropped — why?"), don't dive into the answer. Frame the question first. Out loud. Show the interviewer you can decompose.
When you get a modeling prompt, lead with "what decision is this model going to inform?" Don't lead with the algorithm.
When you give a recommendation, give a recommendation. Not "it depends, here are five options." Pick one. Caveat it. Defend it.

What winning looks like

You don't need to be the strongest pure modeler in the loop. You need to be the candidate who:

Writes clean SQL fast, including window functions and self-joins, without flinching.
Can design an A/B test from a vague product prompt — including sample size, guardrails, and what would make you halt it.
Can reason about confounders and pick the right quasi-experimental method when randomization isn't possible.
Talks about metrics like a product manager would — north star, leading indicators, guardrails, sensitivity — not like a textbook would.
Frames recommendations with a confidence level and a "what would change my mind" sentence.
(For analytics-leadership roles specifically) tells team-leadership stories with concrete artifacts — how you raised the bar, what you mentored someone through, what you killed and why.

If you can do those six things on demand, you're in.