Section D · Production

Data & Pipelines

A PM-eye view of payments data — events, tokenization, reconciliation, ledgers. You don't build this; you must understand it well enough to spec instrumentation and to read what comes out the other end.

The payments event model

Payments are best modeled as an append-only event stream around a stable identity — the payment intent. Common events per intent:

Event	When emitted
`payment_intent.created`	User initiates a deposit
`payment_intent.requires_action`	3DS challenge required
`auth.attempted`	Request sent to PSP
`auth.approved` / `auth.declined`	PSP responds
`auth.cascaded`	Failover to alternate PSP
`capture.requested` / `capture.completed`	If auth/capture is split
`settlement.scheduled` / `settlement.completed`	Funds moved by acquirer
`refund.requested` / `refund.completed`	Refund flow
`dispute.opened` / `dispute.evidence_submitted` / `dispute.resolved`	Chargeback lifecycle
`ledger.posted`	Internal ledger entry written

Every event carries the stable payment_intent_id plus a monotonic sequence. Downstream consumers reconstruct state by replaying events.

Tokenization — where raw card data ends and tokens begin

From a data-pipeline perspective:

Raw PAN enters at the edge (iframe / SDK / hosted-fields).
Vault receives it; returns a vault token (internal).
Vault token may map to a scheme-issued network token (VTS/MDES) used downstream.
Everything after the vault uses tokens, never PAN. PCI scope is the vault and its callers, only.
Analytics data should never have PAN. BIN is fine (first 6-8 digits); full PAN never lands in your warehouse.

Auth and settlement events — they live in different streams

Two distinct sources:

Authorization stream — synchronous, from PSP API responses. Real-time, per attempt.
Settlement stream — asynchronous, from PSP daily files / SFTP / API pulls. Per batch.

These do not arrive at the same time. Authorization is realtime. Settlement files often arrive T+1 morning. Reconciliation joins the two streams on intent ID and amount. Mismatches indicate problems.

Sources of truth — there are several

Source	What it knows	Latency
PSP API responses	Auth attempt outcome	Realtime
PSP settlement report	Funds actually moved	T+1 typical
Acquirer settlement file	Funds reaching the acquirer	T+1
Bank statement	Funds in your nostro	T+1 / EOD
Internal ledger	Customer balance state	Realtime (post-event)
Card scheme reports	Dispute / chargeback events	Days-weeks

When sources disagree, your job is to define which wins for which question:

"Was the customer credited?" → internal ledger.
"Did the acquirer get paid?" → settlement report.
"Did the funds hit our bank?" → bank statement.
"Is the customer about to chargeback?" → scheme report.

Daily reconciliation cycle

Every morning, the recon job runs:

Pull yesterday's PSP settlement files (SFTP, API, or via aggregator).
Parse and normalize into the canonical settlement event schema.
Join to internal auth events on intent_id + amount.
Compute three buckets: matched, mismatched, missing.
Surface to finance / treasury for human review.
Auto-resolve patterns we know (FX rounding deltas, fee adjustments).
Open tickets for genuine mismatches.

PM lever: tighten the matching rules; reduce the manual queue; report weekly on "% of settlement auto-matched."

Cross-border reconciliation — the harder version

Same flow, plus:

FX conversion at one or more legs — your quoted rate vs PSP's actual rate.
Currency-of-record at multiple ledgers — customer-currency, settlement-currency, base-currency.
Timezone misalignment — your "yesterday" and the partner's "yesterday" may differ.
Multiple correspondent banks for SWIFT corridors — each takes a cut.

Senior pattern: shadow ledger in your base currency. Reconcile actual vs expected daily, route deltas to a "FX P&L" account.

Why payments data is messy

Tell-tale signs you should be prepared to discuss:

Schema drift — each PSP has its own fields, casing, decline-code dictionary. Normalization is constant.
Late events — settlements arrive T+N. Disputes arrive weeks later. Your "today's numbers" mutate.
Idempotency artifacts — same event delivered twice; consumers must dedupe.
Out-of-order events — webhook delivery is not ordered; you can see captured before authorized.
Currency rounding — minor units differ (JPY has no decimals; BHD has 3).
Partial refunds and disputes — multi-row events against one txn.
Versioning — your decline-family taxonomy will evolve; analytics must handle historical breaks.

Warehouse vs operational stores

Two different data surfaces:

Operational — Postgres / Cassandra-like. Realtime; low latency; supports the auth path. Per-record reads.
Warehouse — Snowflake / BigQuery / Redshift. Batch loaded from operational + event stream. Analytics queries.

For a PM: most of your analysis lives in the warehouse. Cohort cuts, AAR investigations, vendor scorecards. The operational store is for the real-time path. Don't confuse them in interview.

What a PM asks of the data pipeline

Every auth attempt logged with the canonical field set (see 07).
One stable identity (payment_intent_id) across attempts, captures, refunds, disputes.
Late-arriving events handled — yesterday's numbers can update, but with audit.
Settlement events joined to auth events automatically; mismatches alerted.
Dispute events linked to the underlying intent for representment context.
Warehouse refresh SLA — analytics should be at most a few hours stale, in most cases.
PII handled — no PAN in warehouse; BIN OK; tokenized identifiers only.