Section A · Orient

The Role, Decoded

What "AI agents in Compliance" actually means in practice, what each stack piece is for, and the constraints that shape every design answer.

What the team actually does

A modern Global Compliance team at a fintech or crypto exchange is the regulatory backbone — AML/KYC, sanctions screening, financial crime investigations, regulatory reporting across many countries and licensed entities. Every minute, the team produces and reviews:

  • Alerts — automated systems flag a transaction, account behavior, or KYC mismatch. A human reviews each one.
  • Cases — when an alert escalates, an investigator opens a case, gathers evidence, writes a narrative, and decides: dismiss, escalate, file SAR.
  • Regulatory updates — new rules published by FinCEN, FCA, MAS, BaFin, etc. Compliance has to read, summarize, and adjust controls.
  • EDD reports — Enhanced Due Diligence on high-risk customers (politically exposed persons, high-volume traders, etc.).
  • Audit responses — when a regulator examines, you produce evidence of what you did, when, and why.

All of this is document-heavy, judgment-heavy, and slow. It's exactly where LLMs help — if you can prove the AI didn't make things up, can show your work, and can recover when it goes wrong.

What this role builds

Job descriptions vary, but they tend to converge on something like:

Representative JD language

"Architect agentic workflows using n8n, Claude API, Python, and MCP that take real actions in Compliance processes with appropriate human oversight and approval gates."

Translation: build AI agents that do real work, but never autonomously. Every action passes through human approval. Examples:

  • Alert pre-screening: agent reads the alert + transaction history + KYC, drafts a triage recommendation ("dismiss" / "escalate"). A human approves.
  • Case narrative generation: agent assembles facts from multiple systems into a draft narrative. Investigator edits and signs.
  • Regulatory change summarization: agent ingests new regulations, drafts impact analysis. Compliance lead reviews.
  • EDD drafting: agent pulls public records, transaction patterns, sanctions-list checks; drafts the EDD report. Human reviews.
  • Audit-ready documentation: every agent action logged, replayable, attributable.

What "appropriate human oversight" implies for design

This phrase appears repeatedly in JDs for these roles. It's the defining constraint. Concretely:

  • Risk tiering: not all AI tasks have equal stakes. Drafting a regulatory summary for internal use is low-tier. Recommending a SAR filing is high-tier. Architecture must reflect this.
  • Approval gates: humans confirm before any external action (filing, sending, deciding).
  • Reversibility: prefer designs where AI proposes and humans dispose, rather than autonomous chains.
  • Explainability: every recommendation must show its inputs, sources, and reasoning chain.
  • Stop conditions: the agent must know when to bail to a human.
The shortcut

If you internalize this, half the design questions answer themselves: "How would you build X?""Risk-tier it, identify the human checkpoint, design the audit log first, then the agent."

The tech stack — what each piece is for

n8n

A low-code workflow automation tool. Think Zapier with self-hosted, more developer-friendly, and now with strong AI/LangChain primitives. In compliance contexts:

  • Visual workflow editor (helpful for non-engineers reviewing the flow)
  • Built-in nodes for HTTP, databases, queues, Slack, email
  • Native Claude / OpenAI / agent nodes
  • Self-hostable (matters for data residency / regulated data)

You don't need to be an n8n wizard, but you should be able to say: "n8n is the orchestration layer — workflow, retries, approvals, branching. Heavy AI logic lives in Python or Claude API calls; n8n stitches them together with human-in-the-loop steps." See 05-harnesses-and-agents.

Claude API / Anthropic SDK

Anthropic's API for the Claude models. Key things to know cold:

  • Models: Opus (most capable), Sonnet (balanced), Haiku (fast/cheap). Pick by task.
  • Tool use (function calling): models can call functions you define.
  • Prompt caching: cache long, stable context (system prompts, knowledge bases) for ~90% discount on repeat calls.
  • Structured outputs: get JSON-shaped responses you can validate.
  • Long context: Claude supports very large contexts (200K-1M tokens depending on model). Useful for big regulatory PDFs, full case histories.
  • Computer use / agent SDK: Anthropic ships agent-building primitives.

See 03-ai-development.

Python

The glue and analytics layer. Realistic uses in this role:

  • HTTP clients to compliance systems (Actimize, Quantexa, internal APIs)
  • Pandas / SQL for transaction analysis
  • Custom MCP servers (Python SDK exists)
  • Eval harnesses (most eval frameworks are Python-first)
  • Pre/post-processing around LLM calls

MCP (Model Context Protocol)

Anthropic-led open protocol that standardizes how AI agents access tools, data, and prompts from external systems. Three primitives: tools (callable functions), resources (read-only data the model can consult), prompts (reusable templates). Transport layers: stdio, HTTP, SSE.

In compliance, MCP is how an agent reaches into your KYC database, your case management system, your regulatory document store — without you re-implementing the integration for every agent. See 04-mcp-deep-dive.

Compliance constraints that shape the architecture

JDs in this space typically call out:

  • GDPR (EU privacy, data minimization, right to erasure)
  • BSA (Bank Secrecy Act — US AML reporting)
  • FATF standards (international AML/CFT recommendations)
  • AML data retention (typically 5+ years, varies by jurisdiction)
  • Cross-jurisdictional privacy obligations

What this means for AI design:

  • Don't send customer PII to third-party APIs unnecessarily. Either run models in a controlled environment or have a contract/DPA + data classification policy.
  • Log everything, but log it in a way that supports legitimate erasure requests.
  • Cross-border data flows matter. EU customer data going to a US-based model may need scrutiny.
  • Retention periods — your audit logs must persist for the regulatory window, not just "until cache eviction."

You won't be quizzed on the law itself (you're an AI engineer, not a lawyer), but you should sound aware. "I'd flag this for a privacy review before sending PII to an external API" is a great answer.

Soft signals to expect in the JD

  • "Halt or redesign solutions posing regulatory risks" — they want someone willing to say no, not just ship.
  • "Capability multiplication" — you're expected to enable the team, not hoard tools. Think runbooks, training, documentation.
  • "Translate technical solutions for non-technical stakeholders" — practice explaining a concept (e.g. "what is an eval?") to a compliance officer in 30 seconds.

The seniority clause

JDs for these roles often ask for 8+ years of compliance ops or combined compliance-technical experience. If you don't have that, the reframe:

Believe this

JDs are written broadly to attract a senior pool, and recruiters bring in candidates who don't tick every box when there's signal somewhere else — adjacent skills, learning velocity, motivation, or referral. If you're interviewing, someone decided your conversation was worth their hour. Believe that. The interview is your chance to confirm it; not your chance to convince them you have years you don't.

Don't apologize for years of experience and don't claim experience you don't have. If asked directly, acknowledge once cleanly, redirect to what you do bring (focus, recent depth, motivation, transferable skills), and let the conversation continue. See 02-positioning-from-scratch for specific language.

What to ask them

Strong, role-fit questions to have ready:

  1. "What's the highest-stakes compliance workflow currently being considered for AI assist? What gates are non-negotiable?"
  2. "How is the team currently measuring whether an AI-drafted narrative or summary is good enough before a human signs it?" (You're probing their evals maturity.)
  3. "When an AI-driven decision is later questioned by a regulator, what does your replay/audit story look like today?"
  4. "Where do n8n vs custom Python vs MCP cleanly divide responsibilities in your existing builds, and where is the boundary fuzzy?"
  5. "What's the team's current biggest unsolved problem? Where would a new architect have the most leverage in their first 90 days?"

These show you're thinking about the role as an architect, not as a coder.