Section D · Production

Data & Pipelines

A PM-eye view of payments data — events, tokenization, reconciliation, ledgers. You don't build this; you must understand it well enough to spec instrumentation and to read what comes out the other end.

The payments event model

Payments are best modeled as an append-only event stream around a stable identity — the payment intent. Common events per intent:

EventWhen emitted
payment_intent.createdUser initiates a deposit
payment_intent.requires_action3DS challenge required
auth.attemptedRequest sent to PSP
auth.approved / auth.declinedPSP responds
auth.cascadedFailover to alternate PSP
capture.requested / capture.completedIf auth/capture is split
settlement.scheduled / settlement.completedFunds moved by acquirer
refund.requested / refund.completedRefund flow
dispute.opened / dispute.evidence_submitted / dispute.resolvedChargeback lifecycle
ledger.postedInternal ledger entry written

Every event carries the stable payment_intent_id plus a monotonic sequence. Downstream consumers reconstruct state by replaying events.

Tokenization — where raw card data ends and tokens begin

From a data-pipeline perspective:

  • Raw PAN enters at the edge (iframe / SDK / hosted-fields).
  • Vault receives it; returns a vault token (internal).
  • Vault token may map to a scheme-issued network token (VTS/MDES) used downstream.
  • Everything after the vault uses tokens, never PAN. PCI scope is the vault and its callers, only.
  • Analytics data should never have PAN. BIN is fine (first 6-8 digits); full PAN never lands in your warehouse.

Auth and settlement events — they live in different streams

Two distinct sources:

  • Authorization stream — synchronous, from PSP API responses. Real-time, per attempt.
  • Settlement stream — asynchronous, from PSP daily files / SFTP / API pulls. Per batch.

These do not arrive at the same time. Authorization is realtime. Settlement files often arrive T+1 morning. Reconciliation joins the two streams on intent ID and amount. Mismatches indicate problems.

Sources of truth — there are several

SourceWhat it knowsLatency
PSP API responsesAuth attempt outcomeRealtime
PSP settlement reportFunds actually movedT+1 typical
Acquirer settlement fileFunds reaching the acquirerT+1
Bank statementFunds in your nostroT+1 / EOD
Internal ledgerCustomer balance stateRealtime (post-event)
Card scheme reportsDispute / chargeback eventsDays-weeks

When sources disagree, your job is to define which wins for which question:

  • "Was the customer credited?" → internal ledger.
  • "Did the acquirer get paid?" → settlement report.
  • "Did the funds hit our bank?" → bank statement.
  • "Is the customer about to chargeback?" → scheme report.

Daily reconciliation cycle

Every morning, the recon job runs:

  1. Pull yesterday's PSP settlement files (SFTP, API, or via aggregator).
  2. Parse and normalize into the canonical settlement event schema.
  3. Join to internal auth events on intent_id + amount.
  4. Compute three buckets: matched, mismatched, missing.
  5. Surface to finance / treasury for human review.
  6. Auto-resolve patterns we know (FX rounding deltas, fee adjustments).
  7. Open tickets for genuine mismatches.

PM lever: tighten the matching rules; reduce the manual queue; report weekly on "% of settlement auto-matched."

Cross-border reconciliation — the harder version

Same flow, plus:

  • FX conversion at one or more legs — your quoted rate vs PSP's actual rate.
  • Currency-of-record at multiple ledgers — customer-currency, settlement-currency, base-currency.
  • Timezone misalignment — your "yesterday" and the partner's "yesterday" may differ.
  • Multiple correspondent banks for SWIFT corridors — each takes a cut.

Senior pattern: shadow ledger in your base currency. Reconcile actual vs expected daily, route deltas to a "FX P&L" account.

Why payments data is messy

Tell-tale signs you should be prepared to discuss:

  • Schema drift — each PSP has its own fields, casing, decline-code dictionary. Normalization is constant.
  • Late events — settlements arrive T+N. Disputes arrive weeks later. Your "today's numbers" mutate.
  • Idempotency artifacts — same event delivered twice; consumers must dedupe.
  • Out-of-order events — webhook delivery is not ordered; you can see captured before authorized.
  • Currency rounding — minor units differ (JPY has no decimals; BHD has 3).
  • Partial refunds and disputes — multi-row events against one txn.
  • Versioning — your decline-family taxonomy will evolve; analytics must handle historical breaks.

Warehouse vs operational stores

Two different data surfaces:

  • Operational — Postgres / Cassandra-like. Realtime; low latency; supports the auth path. Per-record reads.
  • Warehouse — Snowflake / BigQuery / Redshift. Batch loaded from operational + event stream. Analytics queries.

For a PM: most of your analysis lives in the warehouse. Cohort cuts, AAR investigations, vendor scorecards. The operational store is for the real-time path. Don't confuse them in interview.

What a PM asks of the data pipeline

  • Every auth attempt logged with the canonical field set (see 07).
  • One stable identity (payment_intent_id) across attempts, captures, refunds, disputes.
  • Late-arriving events handled — yesterday's numbers can update, but with audit.
  • Settlement events joined to auth events automatically; mismatches alerted.
  • Dispute events linked to the underlying intent for representment context.
  • Warehouse refresh SLA — analytics should be at most a few hours stale, in most cases.
  • PII handled — no PAN in warehouse; BIN OK; tokenized identifiers only.