Chapter 04 · Working artifact

Orchestration Patterns

Five architectural patterns for routing verification traffic to one or more vendors — single, waterfall, A/B, geo-routed, and decision-engine. With trade-offs, when to pick each, and config-shape examples you can adapt.

Build an abstraction layer first — before any pattern

Whichever pattern you adopt, do not let vendor SDKs leak into your product domain. The single most reused piece of advice in this whole playbook: wrap the vendor in your own interface from day one, even if you only have one vendor.

from dataclasses import dataclass
from enum import Enum
from typing import Optional

class Decision(str, Enum):
    APPROVED       = "approved"
    DECLINED       = "declined"
    MANUAL_REVIEW  = "manual_review"
    INDETERMINATE  = "indeterminate"   # rare; vendor failure / error

@dataclass(frozen=True)
class VerificationRequest:
    user_id: str
    jurisdiction: str            # ISO 3166-1 alpha-2
    tier: str                    # "lite" | "full" | "kyb"
    requested_modules: list[str] # ["doc", "biometric", "sanctions", "pep", "kyb"]
    locale: str
    return_url: str
    idempotency_key: str

@dataclass(frozen=True)
class VerificationResult:
    decision: Decision
    confidence: float                          # 0.0–1.0
    vendor: str
    vendor_reference: str                      # vendor's case ID for audit
    reasons: list[str]                         # vendor-mapped to your taxonomy
    raw_response: dict                         # for audit; never surfaced to UI
    latency_ms: int
    cost_cents: int                            # for unit economics tracking
    expires_at: Optional[str] = None           # for re-verification

class IDVProvider:
    """Vendor-agnostic interface. Every vendor adapter implements this."""
    name: str
    def start(self, req: VerificationRequest) -> str: ...           # returns session_id / URL
    def status(self, session_id: str) -> VerificationResult: ...
    def webhook(self, payload: dict, signature: str) -> VerificationResult: ...
Why this matters more than your vendor choice

The abstraction layer is what makes every other pattern in this chapter cheap. With it, swapping vendor B for vendor C is a 1-week engineering project. Without it, every pattern below becomes a 3-month rebuild. Build the abstraction even if you currently have only one vendor.

Pattern 1 — Single vendor

One provider. Simplest. Sufficient for many companies.

┌────────────┐ user ─►│ Your app │─► Vendor A ─► Decision └────────────┘

Pros

  • Cheapest to integrate and operate.
  • Best volume tier from a single vendor (better unit economics in pure terms).
  • Compliance posture is simple to explain to regulators.
  • One support escalation path during incidents.

Cons

  • No redundancy. Vendor outage = signup outage.
  • No negotiating leverage at renewal — they know you're locked in.
  • No A/B comparison to benchmark performance.
  • Geographic gaps in vendor's coverage become your gaps.

Pick this when

Pre-product-market-fit; single jurisdiction; volume below ~250K checks/year. Plan to revisit at 500K+.

Pattern 2 — Waterfall (cascade)

Primary vendor handles everything. On rejection or indeterminate result, secondary vendor gets a second look. Reduces false-reject rate at the cost of unit economics and complexity.

┌────────────┐ ┌────────────┐ user ─►│ Your app │── Vendor A ──reject──►│ Vendor B │── decide └────────────┘ └────────────┘ └─ approve ──────────────────► Decision └─ approve ──────────────────► Decision

Pros

  • Higher overall completion rate; recovers users vendor A would reject.
  • Some redundancy: if vendor A is degraded, you have an option.
  • Cleaner regulator story than parallel/A/B ("we use vendor A; we use B as backup").

Cons

  • You pay vendor A and vendor B for users that fall through.
  • Latency stacks: the worst case is vendor A timeout + vendor B full flow.
  • Vendor B is now in your data path — full DD applies to both vendors.
  • Decision provenance can confuse auditors: clearly tag which vendor's evidence the final decision rests on.

Config example

orchestration:
  pattern: waterfall
  steps:
    - vendor: vendor_a
      stop_on: [approved]
      fallback_on: [declined, indeterminate]
      timeout_ms: 30000
    - vendor: vendor_b
      stop_on: [approved, declined]
      fallback_on: [indeterminate]
      timeout_ms: 30000
  on_terminal_indeterminate: manual_review
  audit:
    record_all_vendor_responses: true
    final_decision_provenance: "last_terminal_vendor"

Pick this when

You have a measurable false-reject problem (≥3%) on your primary vendor. Or you operate in a jurisdiction where one vendor doesn't cover a meaningful sub-population (e.g., asylum-seeker documents). Don't adopt this before you've measured the false-reject; you'll just pay more.

Pattern 3 — A/B (cohort split)

Each user is assigned to a vendor based on a stable hash. You measure both in production and use the data to make a final call — or to keep both active as a benchmarking arrangement.

user ─► hash(user_id) % 100 │ ├─ 0–49 ──► Vendor A ──► Decision └─ 50–99 ──► Vendor B ──► Decision

Pros

  • Real, head-to-head measurement on your population.
  • Ongoing negotiating leverage — both vendors know they're being compared.
  • Risk diversification: one vendor's outage takes 50% of users, not 100%.
  • Easy fallback if a vendor degrades — flip the % to 0 in seconds.

Cons

  • Worst volume tier — you split commitments across two vendors.
  • Inconsistent user experience (if you upsell to the same user later, the cohort lock-in matters).
  • More engineering: two integrations, two webhooks, two on-call escalation paths.
  • Regulator may ask "why two vendors" — answer should be measurement / redundancy, not "we couldn't decide."

Config example

orchestration:
  pattern: ab
  assignment:
    key: user_id
    sticky: true
    splits:
      - vendor: vendor_a
        weight: 50
      - vendor: vendor_b
        weight: 50
  emergency_override:
    enabled: true
    fallback_vendor: vendor_a
    trigger:
      vendor_b_error_rate_5m: ">= 0.05"

Pick this when

You're at a scale where 5–10% volume per arm produces stat-sig results in a 4-week window (~50K+ checks per arm). You want measurement-driven choice or ongoing pricing leverage from two real options.

Sample-size sanity check

To detect a 1-point completion-rate difference (e.g., 86% vs 87%) with p=0.05 and 80% power, you need ~25K verifications per arm. For a 0.5-point false-reject difference (e.g., 2.0% vs 2.5%), it's closer to ~50K per arm. Plan A/B duration around these, not vibes.

Pattern 4 — Geo-routing

Different vendors for different countries. Common when no single vendor wins on coverage everywhere — typically with an EM-focused vendor in LATAM / MENA / CIS and an incumbent in NA / EU.

user ─► country=US ──► Vendor A country=MX|BR|CO ──► Vendor D country=DE|AT|CH ──► Vendor IDnow-style country=* ──► Vendor A (default)

Pros

  • Best per-country coverage and completion rates.
  • Can negotiate vendor-specific deals (volume tier + market focus).
  • Aligns with regulator preferences in regulated jurisdictions.

Cons

  • Operational complexity: every new vendor is a new full DD + integration.
  • You may end up with 4+ vendors, which is painful at audit time.
  • Edge cases at borders (a German user signing up from a US IP).
  • Aggregate reporting requires normalization across vendors.

Config example

{
  "orchestration": {
    "pattern": "geo_routing",
    "rules": [
      { "match": { "country": ["DE", "AT", "CH"] },        "vendor": "vendor_idnow" },
      { "match": { "country": ["MX", "BR", "CO", "AR"] }, "vendor": "vendor_incode" },
      { "match": { "country": ["IN"] },                    "vendor": "vendor_sumsub" },
      { "match": { "country": ["US", "CA"] },              "vendor": "vendor_persona" },
      { "match": { "country": "*" },                       "vendor": "vendor_persona" }
    ],
    "country_resolution": {
      "primary":  "user_declared",
      "verify_against": ["ip_geolocation", "document_country"],
      "on_mismatch": "manual_review"
    }
  }
}

Pick this when

You serve 3+ jurisdictions with materially different document landscapes (e.g., DE/AT/CH video-ident plus EM countries) and your top vendor's coverage matrix has obvious gaps.

Pattern 5 — Decision-engine (weighted scoring)

Each verification produces a score from multiple signals (vendor decision, vendor confidence, sanctions hit, internal risk model, device signals, behavioral analytics). A decision engine combines them into the final approve/decline/review verdict — sometimes using multiple vendors in parallel for the same user.

user ─► parallel: ├─► Vendor A (doc + biometric) ├─► Vendor B (sanctions / PEP) ├─► Internal fraud model (device + behavioral) └─► Internal sanctions cache ─► Decision engine (weighted) ─► Decision

Pros

  • Highest accuracy in mature operations; combines best-of-breed signals.
  • Granular control: tune weights per cohort / jurisdiction / risk tier.
  • Future-proof: adding a new signal is a weight change, not an architectural shift.
  • Internal rules engine becomes the source of truth — vendors are subordinated.

Cons

  • Highest complexity; requires a real risk / decisioning function to operate.
  • Requires more vendor unit-economics: most signals are billed independently.
  • Hardest to explain to regulators ("show me your decision logic"). You'll need versioned rules, model cards, full audit trails.
  • Decision-engine ownership becomes a long-lived team commitment.

Config example (Python-shape)

def decide(signals: dict, tier: str) -> Decision:
    """
    signals: {
      'vendor_a_decision':   'approved',
      'vendor_a_confidence': 0.92,
      'sanctions_hit':       False,
      'pep_match':           False,
      'fraud_score':         0.13,     # internal fraud model 0–1
      'device_risk':         'low',
      'doc_country_matches_ip': True,
    }
    """
    if signals['sanctions_hit']:
        return Decision.MANUAL_REVIEW  # mandatory human review on hits

    if signals['vendor_a_decision'] == 'declined':
        # Vendor A is authoritative on doc/biometric rejection
        return Decision.DECLINED

    # Composite score: 0–100, higher = more confident approval
    score = 0
    score += int(signals['vendor_a_confidence'] * 50)            # max 50
    score += 20 if signals['device_risk'] == 'low' else 0
    score += 15 if signals['doc_country_matches_ip'] else 0
    score -= int(signals['fraud_score'] * 30)                    # penalty
    if signals['pep_match']:
        score -= 25

    thresholds = {
        'lite': {'approve': 60, 'review': 35},
        'full': {'approve': 75, 'review': 55},
        'kyb':  {'approve': 80, 'review': 60},
    }[tier]

    if score >= thresholds['approve']:
        return Decision.APPROVED
    if score >= thresholds['review']:
        return Decision.MANUAL_REVIEW
    return Decision.DECLINED

Pick this when

You're at $XX M+ revenue, have a dedicated risk function, run multiple jurisdictions, and your false-reject economics justify the engineering investment. Don't start here.

How to choose

PatternEng costVendor costResilienceLeverageWhen to choose
SingleLowBestLowNonePre-PMF; <250K/yr; one jurisdiction
WaterfallMediumWorse (pay both)MediumMediumHave measured false-reject >3%
A/BMedium-HighWorse (split tier)HighHigh (ongoing)Volume supports stat-sig comparisons (≥50K/arm/month)
Geo-routingHighMixedMedium-HighMedium3+ jurisdictions with no single coverage winner
Decision-engineVery HighHighestVery HighVery HighMature risk function, >$XXM revenue
Typical evolution path

Most companies pass through these in order:

  1. Year 0–1: Single vendor. Build the abstraction layer. Measure baseline.
  2. Year 1–2: Add a second vendor for either waterfall (if false-reject is the problem) or geo (if coverage is the problem). Many teams stop here.
  3. Year 2–3: A/B on top of geo, to keep both shortlist vendors honest and competitive.
  4. Year 3+: Build the decision engine. Vendors become signal providers, not decision-makers.

Trying to start at step 4 has burned a lot of teams. The decision-engine without baseline data isn't measurably better than single-vendor.

Observability invariants — same for every pattern

Whichever pattern you adopt, these dashboards and audit invariants must exist before you flip the switch. They're what you use to debug, what regulators ask for, and what tells you when to switch vendors.

Per-verification audit record

Every verification — successful, failed, abandoned, retried — must produce one canonical event with the schema below. Stored append-only, retained per regulatory floor.

{
  "verification_id":     "ver_01HZX...",
  "user_id":             "usr_01HZX...",
  "tier":                "full",
  "jurisdiction":        "DE",
  "orchestration_pattern": "geo_routing",
  "vendor_calls": [
    {
      "vendor":            "vendor_idnow",
      "vendor_reference":  "idnow-12345",
      "started_at":        "2026-05-12T13:14:00Z",
      "completed_at":      "2026-05-12T13:14:42Z",
      "latency_ms":        42000,
      "decision":          "approved",
      "confidence":        0.94,
      "cost_cents":        185,
      "raw_response_ref":  "s3://audit-raw/ver_01HZX/idnow-12345.json"
    }
  ],
  "internal_signals": {
    "fraud_score":          0.07,
    "device_risk":          "low",
    "sanctions_hit":        false
  },
  "final_decision":      "approved",
  "final_decision_by":   "decision_engine_v3.2.1",
  "decided_at":          "2026-05-12T13:14:43Z",
  "audit_signature":     "sha256:..."
}

Dashboards (minimum set)

  • Funnel: started → submitted → decided → approved, by vendor, by country.
  • Decision mix: approve / decline / manual-review / indeterminate, by vendor.
  • Latency: p50 / p95 / p99 time-to-decision, by vendor, by decision type.
  • Error rate: vendor 5xx / timeout / contract violation, by vendor, last 24h.
  • Unit cost: blended cost-per-decision, by vendor, by decision type.
  • Manual-review queue: depth, age, throughput, by vendor.
  • Dispute rate: users who claim they were wrongly declined, by vendor.
If you don't have these dashboards before launch, you don't have launch

The number-one cause of "we don't know why our completion rate dropped" is missing dashboards. Build them in the integration phase, not after the incident.