Evaluation & Quality — Metrics for an Onboarding Platform
How a Platform PM measures whether the platform is working. Onboarding funnel metrics, team-velocity metrics, vendor economics, the false-reject vs false-accept trade-off, and SLAs that are platform-as-a-product.
Two customers, two metric stacks
The platform serves end users (indirectly) and internal teams (directly). The metric stack reflects both. Don't pick one and ignore the other.
| Stack | What you measure | Who cares |
|---|---|---|
| End-user / business outcome | Activation, time-to-complete, conversion by segment/region, abandonment | Growth, exec, finance |
| Platform / internal-team | Time-to-launch, adoption breadth, IDX scorecard, SLA | Consumer-team PMs, eng leadership |
| Operational / risk | False-accept, false-reject, vendor cost, manual-review volume, escalation rate | Compliance, ops, finance |
A confident Platform PM names all three stacks in the first 30 seconds of a metrics question.
North-star options for an onboarding platform
You're going to be asked: "What's your north star?" There's no single right answer. Strong candidates name a candidate, defend it, then critique it. Three reasonable options:
- "Time-to-launch a new onboarding configuration" — the most platform-shaped. Defended by: it directly measures the team-as-customer outcome, it's easy to attribute, it's a leading indicator for business agility. Critique: it's not directly tied to revenue, and it can be gamed by trivial launches.
- "Verified-customer activation rate" — the most business-shaped. Defended by: ties to revenue, easy to share with finance. Critique: too far downstream for a platform team to clearly own.
- "Successful verifications per dollar of vendor cost" — the most operational. Defended by: clean unit economic. Critique: ignores the team-velocity dimension entirely.
Pragmatic move: pick one primary (usually time-to-launch), pair it with two paired metrics (activation and per-verification cost), and explicitly state the guardrails (false-accept rate, customer-complaint volume).
Onboarding funnel metrics — the basics
The funnel framing every Platform PM should be able to draw on a whiteboard:
| Step | Metric | What "good" usually looks like |
|---|---|---|
| Started | Sign-up entries / day, by source & segment | Stable, trending up |
| Identity submitted | % of starts that submit ID | 60-80% varies by friction |
| Verified | % of submissions that pass first-vendor decision | 70-90% |
| Approved (post-screening) | % of verified that pass sanctions / PEP / adverse | 95%+ (most hits are false-positive) |
| Activated | % of approved that fund / first-trade within N days | 30-60% |
Every transition has its own "why dropped" answer. Instrument each one separately. A common platform-PM finding: the largest single drop is between "started" and "identity submitted" — i.e. the user gave up before uploading a doc.
Funnel rates aggregated across all-time hide everything. Cohort by week of start, and by segment × region. The platform that "looks fine globally" is often quietly failing in one country or one segment.
Platform / team-velocity metrics
The metrics that make the platform's case to leadership:
- Lead time to add a new market — kickoff to first verified customer. Target: declining quarter-over-quarter.
- Lead time to add a new IDV vendor — contract signed to first production traffic. Target: weeks, not quarters.
- Lead time to change a policy — Compliance decides → live in prod. Target: days, with same-day for emergencies.
- Number of teams onboarded to the platform, with a count of net-new flows launched on-platform vs forked.
- Internal-customer NPS — quarterly pulse to consumer-team PMs and tech leads.
- Platform-team support load — tickets per consumer team per month, ideally declining as docs improve.
- Migration progress — % of legacy flows retired.
Vendor economics & quality
The vendor layer is a big-ticket cost center and a big-ticket quality dimension. Metrics worth tracking per vendor:
| Metric | Definition |
|---|---|
| Cost per attempt | What you pay regardless of outcome |
| Cost per successful verification | Cost / approvals — the real unit economic |
| First-pass success rate | % of attempts that resolve without escalation |
| Decision latency p50 / p95 / p99 | How long from session create to decision |
| Manual-review carry-over rate | % of vendor decisions that ops must re-review |
| Reject reason distribution | Where the vendor is being conservative; where it's missing |
| Availability / error rate | 5xx, timeouts, SDK errors |
The dashboard that wins vendor negotiations: cost-per-successful-verification by vendor by jurisdiction by document type, sliced six months back. Bring this to the renewal conversation.
False-reject vs false-accept — the trade-off you live with
Every verification system makes one of two kinds of error:
- False-reject — a legitimate customer fails. Hurts activation, hurts CSAT.
- False-accept — a fraudulent / sanctioned / underage user passes. Hurts the business legally and financially.
You can move the threshold, but you can't reduce both. Mature platforms tune the threshold per segment, per market, per signal-mix — high-stakes products (Margin) trade more false-reject for lower false-accept; low-stakes (basic spot) the reverse, within compliance constraints.
Practical measurement:
- False-reject is observable: customers complaining, support tickets, manual review overrides.
- False-accept is harder — only surfaces later as fraud, chargeback, regulatory escalation, or transaction-monitoring alert. Track downstream fraud rate by onboarding cohort.
- The platform should make threshold changes auditable and revertable.
The Growth team's instinct is to push for higher activation. The Compliance team's instinct is to push for stricter screening. Your job is to make the trade-off explicit and instrumented, not to pick a side. Bring data, not vibes.
Platform SLAs — onboarding as a product
A real platform publishes SLAs and lives by them. Reasonable shape for an onboarding platform:
| Surface | SLA |
|---|---|
| API availability | 99.95% monthly |
| Decision latency p95 | < 30 seconds for instant verifications |
| Event delivery | < 60 seconds end-to-end, at-least-once |
| Webhook retry contract | Up to 24h, exponential backoff, signed payloads |
| Time-to-acknowledge an incident | 15 minutes business-hours, 1h after-hours |
| Policy-change lead time | Same-day for emergency, 5 business days standard |
SLAs are commitments. Publish them, monitor them, and report against them. When you miss, write a postmortem and share it. That's what a platform-as-product looks like.
Guardrail metrics — what should never move
Guardrails are the metrics where regression is unacceptable, regardless of what the primary metric does. For onboarding:
- Sanctions hit rate — must not decline because you broke screening.
- Audit-log completeness — every state transition has an event. Drop = compliance incident.
- PII leakage — count of PII-bearing fields landing in logs / analytics. Target: zero.
- Cross-tenant data isolation — never expose one applicant's data via another's session.
- Customer-complaint rate — a spike means false-reject is rising even if approval rate looks fine.
Guardrails are the metrics that show up in a postmortem when something goes wrong. Naming them in advance is half the platform PM's value.
Metric anti-patterns to call out
- Vanity API-call counts — your platform got 10M calls; cool, but how many were successful, by which team, for which outcome?
- Goodhart's Law on activation — chase activation by lowering the screening bar, win the quarter, lose the year via fraud.
- Average decision latency — hides the long tail. Always report p95/p99.
- Aggregate funnel rates — hide regional / segment dynamics. Always cohort.
- "Adoption" measured as integrations — a team integrated and never shipped is not adoption.
- "We don't have data yet" — you have data. You haven't instrumented yet. Different problem.
A reasonable platform dashboard, sketched
If asked "what's on your dashboard?" — the answer that lands:
Sample dashboard shape
- Row 1 — Business outcome: activation rate (verified + funded) by segment by region, w/w and m/m.
- Row 2 — Funnel: drop-off by step, cohort by week, with annotations for product/policy changes.
- Row 3 — Vendor health: per-vendor success rate, latency p95, cost-per-successful, with a small alert flag if any move outside band.
- Row 4 — Platform health: API availability, event-pipeline lag, audit-event volume.
- Row 5 — Team velocity: open migration projects, new flows shipped, time-to-launch trend, IDX NPS.
- Row 6 — Guardrails: sanctions hit-rate, manual-review queue depth, customer-complaint volume.
One dashboard. Everyone in the chain reads it. You walk through it weekly with eng, monthly with execs.