Stretch Goal · 30 min

Hit a Real Sanctions API

Replace the mock SDN list with the OpenSanctions API. Learn what production tool design actually looks like.

⏱ 30 minutes 🌐 Real public API ⚠️ Failure-mode focus

Prerequisite

You've completed the main MCP build tutorial. The previous stretches (resources, prompts) are not required but recommended.

From demo to real — what changes

The mocked lookup_sanctions_hit tool returned in microseconds with no failure modes. Real-world tool design isn't like that. Real APIs:

Take 100ms-2s instead of microseconds
Sometimes return 429 (rate limited), 500 (down), or timeout
Return data in their schema, not yours
Have terms of service and rate limits
Cost money or have free-tier limits
Need credential handling

Every production MCP tool wraps a flaky upstream system. The patterns you learn here — timeout, retry, fallback, caching, schema mapping — are the patterns for tool design in compliance work.

About OpenSanctions

OpenSanctions is a free, open-source aggregated sanctions and PEP database. Their API has a free tier with rate limits — perfect for learning. The matching endpoint (/match/sanctions) requires an API key — sign up free at opensanctions.org/account. For production you'd use a commercial vendor (Refinitiv, Dow Jones, ComplyAdvantage) with stricter SLAs and licensing fit for regulated use.

1Install httpx~2 min

We'll use httpx for async HTTP. It's the modern requests with timeout/retry support and works cleanly in async contexts.

shell (venv active)

pip install httpx

2Replace the mock with the real call~10 min

First, get an OpenSanctions API key. Sign up at opensanctions.org/account and copy the key from your account page. The matching endpoint will return 401 Unauthorized without it.

Wire it into Claude Desktop via the server config (Claude Desktop spawns your server as a subprocess and passes env through):

claude_desktop_config.json

{
  "mcpServers": {
    "compliance-toolkit": {
      "command": "/absolute/path/to/compliance-mcp/.venv/bin/python",
      "args": ["/absolute/path/to/compliance-mcp/server.py"],
      "env": {
        "OPENSANCTIONS_API_KEY": "your-key-here"
      }
    }
  }
}

For standalone testing with mcp dev, export it in your shell first: export OPENSANCTIONS_API_KEY=your-key-here.

Open server.py. Add these imports at the top:

server.py (imports)

import os
import httpx

Now replace the entire lookup_sanctions_hit function with this real-API version. Keep the old SDN_LIST dictionary around — we'll use it as the fallback in step 3.

server.py

OPENSANCTIONS_BASE = "https://api.opensanctions.org"
OPENSANCTIONS_API_KEY = os.environ.get("OPENSANCTIONS_API_KEY", "")


@mcp.tool()
def lookup_sanctions_hit(name: str) -> dict:
    """Screen a person or entity name against the live OpenSanctions database
    (OFAC SDN, EU Consolidated, UN, and ~250 other sanction sources).

    Returns a hit record with match details if found, or an explicit no-match
    record otherwise. Use for KYC, transaction counterparty, or onboarding screening.

    Args:
        name: Full legal name or entity name to screen. Case-insensitive.
    """
    payload = {
        "queries": {
            "q1": {
                "schema": "Thing",  # Person | Organization | Thing (Thing matches both)
                "properties": {"name": [name]},
            }
        }
    }
    try:
        response = httpx.post(
            f"{OPENSANCTIONS_BASE}/match/sanctions",
            json=payload,
            headers={"Authorization": f"Bearer {OPENSANCTIONS_API_KEY}"},
            timeout=5.0,
        )
        response.raise_for_status()
        results = response.json().get("responses", {}).get("q1", {}).get("results", [])
    except httpx.TimeoutException:
        return {
            "match": "unknown",
            "name_queried": name,
            "error": "OpenSanctions API timed out after 5s — escalate to manual review",
            "recommended_action": "manual-screening-required",
        }
    except httpx.HTTPStatusError as e:
        return {
            "match": "unknown",
            "name_queried": name,
            "error": f"OpenSanctions API returned {e.response.status_code}",
            "recommended_action": "manual-screening-required",
        }
    except Exception as e:
        return {
            "match": "unknown",
            "name_queried": name,
            "error": f"Sanctions screening failed: {type(e).__name__}",
            "recommended_action": "manual-screening-required",
        }

    if not results:
        return {
            "match": False,
            "name_queried": name,
            "screened_sources": "OpenSanctions aggregated (250+ sources)",
            "recommended_action": "proceed-with-standard-cdd",
        }

    top = results[0]
    return {
        "match": True,
        "name_queried": name,
        "matched_entity": top.get("caption"),
        "score": top.get("score"),
        "schema": top.get("schema"),
        "datasets": top.get("datasets", []),
        "first_seen": top.get("first_seen"),
        "recommended_action": "halt-and-escalate",
        "source_url": f"https://www.opensanctions.org/entities/{top.get('id', '')}/",
    }

The patterns to notice

1. Explicit timeout (5s). Without it, a slow upstream can hang the agent indefinitely.

2. Three exception classes handled separately. Timeout, HTTP error, generic — each gets a structured response the model can react to.

3. Errors return data, not raise. "match": "unknown" + "recommended_action": "manual-screening-required" tells the model what to do next. Compare to raising an exception, which derails the session.

4. Schema translation. OpenSanctions returns its shape; we return ours. Translation happens at the tool boundary, not in the model.

5. Fail-safe defaults. When screening fails, the recommendation is always "escalate to human" — never "proceed." In compliance, ambiguity must default to more review, not less.

Truthiness gotcha for downstream consumers

match is now a tri-state: True, False, or the string "unknown". If any code does if sanctions["match"]:, it will treat "unknown" as a hit (non-empty strings are truthy) and then try to read fields like sanctions["list"] that don't exist on the error shape — KeyError.

Update summarize_alert (and any other consumer) to test explicitly:

if sanctions["match"] is True:
    # confirmed hit — safe to read list/program/score
    ...
elif sanctions["match"] == "unknown":
    # screening was degraded — flag it, escalate, do NOT treat as clean
    ...
# else: confirmed no match, proceed

This is a real production pattern: every tri-state return value forces every caller to handle the unknown branch. It's worth the extra elif — a silent miss in compliance has regulatory consequences.

Restart Claude Desktop and paste either prompt below into a new chat. Both hit the live API; the first exercises lookup_sanctions_hit directly, the second drives the full summarize_alert chain so you can see the API result flow through to a recommendation.

test prompt 1 — single tool, guaranteed hit

Screen the individual "Vladimir Putin" for sanctions. Tell me which lists
he appears on, the match score, and the recommended action. Include the
OpenSanctions source URL in your answer.

Expect a halt-and-escalate recommendation, a score near 1.0, ~30+ datasets (OFAC SDN, EU Consolidated, UK HMT, UN, etc.), and a source_url ending in /entities/Q7747/ or similar.

test prompt 2 — full chain via summarize_alert

Triage alert ALT-2026-00099: a $75,000 wire transfer to counterparty
"Kim Jong Un" in North Korea (KP). Walk me through the screening result,
every flag that fired, and the recommended action.

Expect three flags on the response: SANCTIONS HIT (from the live API), PROHIBITED JURISDICTION: KP (from your local JURISDICTION_RISK table), and CTR THRESHOLD (amount ≥ $10K). Recommendation: halt-and-escalate.

Watch the audit trail live

In a second terminal, tail the MCP log so you can see each call hit OpenSanctions in real time:

# macOS
tail -f ~/Library/Logs/Claude/mcp-server-compliance-toolkit.log | grep -E "sanctions|tool_call"

You'll see your logging.info(...) lines interleaved with the JSON-RPC traffic — cache hits, API outcomes, and error fallbacks. If you used the structured log shape from the build guide (name=... source=... outcome=...), it's grep-friendly out of the box: grep "source=cache" shows everything served from cache; grep "outcome=hit" shows real matches.

3Add a local fallback~5 min

Real-world systems need to degrade gracefully when an upstream is down. Let's add a fallback to the local SDN_LIST we kept from the original tutorial.

Update lookup_sanctions_hit — in each of the three except branches, instead of returning "manual-screening-required", try the local fallback first:

server.py — refactor

def _local_fallback(name: str, reason: str) -> dict:
    """Fallback to the small embedded SDN list when the upstream API is unavailable.

    NOT a substitute for the full sanctions check — clearly marks the response
    as a degraded reading so investigators know to re-screen when upstream recovers.
    """
    key = name.strip().lower()
    if key in SDN_LIST:
        hit = SDN_LIST[key]
        return {
            "match": True,
            "name_queried": name,
            "degraded_reading": True,
            "fallback_reason": reason,
            "list": hit["list"],
            "program": hit["program"],
            "added": hit["added"],
            "recommended_action": "halt-and-escalate-AND-re-screen-when-upstream-recovers",
        }
    return {
        "match": "unknown",
        "name_queried": name,
        "degraded_reading": True,
        "fallback_reason": reason,
        "screened_sources": "embedded fallback only (3 names)",
        "recommended_action": "manual-screening-required",
    }

Now in lookup_sanctions_hit, replace the three error returns with:

server.py — refactor each except branch

    except httpx.TimeoutException:
        return _local_fallback(name, "OpenSanctions API timed out after 5s")
    except httpx.HTTPStatusError as e:
        return _local_fallback(name, f"OpenSanctions API returned {e.response.status_code}")
    except Exception as e:
        return _local_fallback(name, f"Sanctions screening failed: {type(e).__name__}")

"Degraded reading" pattern

Notice the "degraded_reading": True flag. This is critical for compliance: the model — and the audit log — must know this answer came from a fallback, not the authoritative source. The recommendation explicitly says "re-screen when upstream recovers." A regulator looking at the trace sees the firm continued operating during an outage and committed to re-validating.

4Add response caching~5 min

Real APIs cost money and have rate limits. Cache identical lookups. For a session-scoped cache, just use a dict:

server.py

from time import time

_CACHE: dict[str, tuple[float, dict]] = {}
_CACHE_TTL_SECONDS = 3600  # 1 hour — adjust for compliance freshness requirements


def _cache_get(name: str) -> dict | None:
    key = name.strip().lower()
    if key not in _CACHE:
        return None
    ts, value = _CACHE[key]
    if time() - ts > _CACHE_TTL_SECONDS:
        del _CACHE[key]
        return None
    return value


def _cache_set(name: str, value: dict) -> None:
    _CACHE[name.strip().lower()] = (time(), value)

Wire it into lookup_sanctions_hit — at the top, before any API call:

server.py — top of lookup_sanctions_hit

    cached = _cache_get(name)
    if cached is not None:
        return {**cached, "from_cache": True}

And before each successful return, add _cache_set(name, result):

server.py — wrap returns

    # Replace the no-match return:
    no_match = {
        "match": False,
        "name_queried": name,
        "screened_sources": "OpenSanctions aggregated (250+ sources)",
        "recommended_action": "proceed-with-standard-cdd",
    }
    _cache_set(name, no_match)
    return no_match

    # And the hit return:
    hit = { ... existing dict ... }
    _cache_set(name, hit)
    return hit

Caching in compliance is a tradeoff

Cache too long → you miss a fresh sanctions designation. Cache too short → you blow rate limits and cost money. 1 hour is fine for live screening; 0 (no cache) for transaction-monitoring alerts. Make the TTL explicit in the audit log so a regulator can see when a cached vs fresh reading was used.

5Test the failure modes deliberately~8 min

The point of this exercise is the failure modes. Force each one:

1Force a timeout

Change the timeout in your httpx.post call from 5.0 to 0.001. Restart, run a screening. The tool will return the degraded-reading fallback. Watch how Claude handles it — usually it says "the screening was degraded, recommending manual review" and surfaces the limitation transparently.

Revert the timeout.

2Force a 4xx

Change OPENSANCTIONS_BASE to "https://api.opensanctions.org/nope". The API will return 404. Watch the fallback fire. Notice the degraded reading flag in the result, and how Claude surfaces it.

Revert.

3Test the cache

Run the same name twice in succession in Claude. Watch the tool-call detail — the second call's response includes "from_cache": true. Latency is markedly lower.

4Add observability

If you added the audit logging from the main tutorial's "break it" section, log the full path here: cache hit vs miss, API status code or exception class, fallback used or not, latency. This is the substrate for a real compliance audit log.

example log line

logging.info(
    "sanctions_screening name=%s outcome=%s source=%s latency_ms=%d",
    name, result.get("match"), "cache" if cached else "api", elapsed_ms,
)

What you can now say in the interview

Sample answer — ~75 seconds

"I wired my sanctions-screening tool to a real public API — OpenSanctions — to learn what production tool design looks like. Five patterns that matter. One, explicit timeouts on every external call so a slow upstream can't hang the agent. Two, structured error returns instead of raised exceptions, so the model sees the failure and can choose a different path. Three, a fail-safe default: when screening fails, the recommendation is always 'escalate to human,' never 'proceed' — in compliance, ambiguity defaults to more review, never less. Four, a local fallback that explicitly marks itself as a 'degraded reading' with a flag the model and the audit log can see, so a regulator knows when a fallback was used and that a re-screen was committed to. Five, a TTL'd cache to respect rate limits, but a deliberate decision about TTL — short enough that fresh sanctions designations aren't missed. Every external system you wrap in a tool needs to think through these five things, or the agent silently inherits the upstream's flakiness."