Hit a Real Sanctions API
Replace the mock SDN list with the OpenSanctions API. Learn what production tool design actually looks like.
You've completed the main MCP build tutorial. The previous stretches (resources, prompts) are not required but recommended.
From demo to real — what changes
The mocked lookup_sanctions_hit tool returned in microseconds with no failure modes. Real-world tool design isn't like that. Real APIs:
- Take 100ms-2s instead of microseconds
- Sometimes return 429 (rate limited), 500 (down), or timeout
- Return data in their schema, not yours
- Have terms of service and rate limits
- Cost money or have free-tier limits
- Need credential handling
Every production MCP tool wraps a flaky upstream system. The patterns you learn here — timeout, retry, fallback, caching, schema mapping — are the patterns for tool design in compliance work.
OpenSanctions is a free, open-source aggregated sanctions and PEP database. Their API has a free tier with rate limits — perfect for learning. The matching endpoint (/match/sanctions) requires an API key — sign up free at opensanctions.org/account. For production you'd use a commercial vendor (Refinitiv, Dow Jones, ComplyAdvantage) with stricter SLAs and licensing fit for regulated use.
1Install httpx~2 min
We'll use httpx for async HTTP. It's the modern requests with timeout/retry support and works cleanly in async contexts.
pip install httpx
2Replace the mock with the real call~10 min
First, get an OpenSanctions API key. Sign up at opensanctions.org/account and copy the key from your account page. The matching endpoint will return 401 Unauthorized without it.
Wire it into Claude Desktop via the server config (Claude Desktop spawns your server as a subprocess and passes env through):
{
"mcpServers": {
"compliance-toolkit": {
"command": "/absolute/path/to/compliance-mcp/.venv/bin/python",
"args": ["/absolute/path/to/compliance-mcp/server.py"],
"env": {
"OPENSANCTIONS_API_KEY": "your-key-here"
}
}
}
}
For standalone testing with mcp dev, export it in your shell first: export OPENSANCTIONS_API_KEY=your-key-here.
Open server.py. Add these imports at the top:
import os
import httpx
Now replace the entire lookup_sanctions_hit function with this real-API version. Keep the old SDN_LIST dictionary around — we'll use it as the fallback in step 3.
OPENSANCTIONS_BASE = "https://api.opensanctions.org"
OPENSANCTIONS_API_KEY = os.environ.get("OPENSANCTIONS_API_KEY", "")
@mcp.tool()
def lookup_sanctions_hit(name: str) -> dict:
"""Screen a person or entity name against the live OpenSanctions database
(OFAC SDN, EU Consolidated, UN, and ~250 other sanction sources).
Returns a hit record with match details if found, or an explicit no-match
record otherwise. Use for KYC, transaction counterparty, or onboarding screening.
Args:
name: Full legal name or entity name to screen. Case-insensitive.
"""
payload = {
"queries": {
"q1": {
"schema": "Thing", # Person | Organization | Thing (Thing matches both)
"properties": {"name": [name]},
}
}
}
try:
response = httpx.post(
f"{OPENSANCTIONS_BASE}/match/sanctions",
json=payload,
headers={"Authorization": f"Bearer {OPENSANCTIONS_API_KEY}"},
timeout=5.0,
)
response.raise_for_status()
results = response.json().get("responses", {}).get("q1", {}).get("results", [])
except httpx.TimeoutException:
return {
"match": "unknown",
"name_queried": name,
"error": "OpenSanctions API timed out after 5s — escalate to manual review",
"recommended_action": "manual-screening-required",
}
except httpx.HTTPStatusError as e:
return {
"match": "unknown",
"name_queried": name,
"error": f"OpenSanctions API returned {e.response.status_code}",
"recommended_action": "manual-screening-required",
}
except Exception as e:
return {
"match": "unknown",
"name_queried": name,
"error": f"Sanctions screening failed: {type(e).__name__}",
"recommended_action": "manual-screening-required",
}
if not results:
return {
"match": False,
"name_queried": name,
"screened_sources": "OpenSanctions aggregated (250+ sources)",
"recommended_action": "proceed-with-standard-cdd",
}
top = results[0]
return {
"match": True,
"name_queried": name,
"matched_entity": top.get("caption"),
"score": top.get("score"),
"schema": top.get("schema"),
"datasets": top.get("datasets", []),
"first_seen": top.get("first_seen"),
"recommended_action": "halt-and-escalate",
"source_url": f"https://www.opensanctions.org/entities/{top.get('id', '')}/",
}
1. Explicit timeout (5s). Without it, a slow upstream can hang the agent indefinitely.
2. Three exception classes handled separately. Timeout, HTTP error, generic — each gets a structured response the model can react to.
3. Errors return data, not raise. "match": "unknown" + "recommended_action": "manual-screening-required" tells the model what to do next. Compare to raising an exception, which derails the session.
4. Schema translation. OpenSanctions returns its shape; we return ours. Translation happens at the tool boundary, not in the model.
5. Fail-safe defaults. When screening fails, the recommendation is always "escalate to human" — never "proceed." In compliance, ambiguity must default to more review, not less.
match is now a tri-state: True, False, or the string "unknown". If any code does if sanctions["match"]:, it will treat "unknown" as a hit (non-empty strings are truthy) and then try to read fields like sanctions["list"] that don't exist on the error shape — KeyError.
Update summarize_alert (and any other consumer) to test explicitly:
if sanctions["match"] is True:
# confirmed hit — safe to read list/program/score
...
elif sanctions["match"] == "unknown":
# screening was degraded — flag it, escalate, do NOT treat as clean
...
# else: confirmed no match, proceed
This is a real production pattern: every tri-state return value forces every caller to handle the unknown branch. It's worth the extra elif — a silent miss in compliance has regulatory consequences.
Restart Claude Desktop and paste either prompt below into a new chat. Both hit the live API; the first exercises lookup_sanctions_hit directly, the second drives the full summarize_alert chain so you can see the API result flow through to a recommendation.
Screen the individual "Vladimir Putin" for sanctions. Tell me which lists
he appears on, the match score, and the recommended action. Include the
OpenSanctions source URL in your answer.
Expect a halt-and-escalate recommendation, a score near 1.0, ~30+ datasets (OFAC SDN, EU Consolidated, UK HMT, UN, etc.), and a source_url ending in /entities/Q7747/ or similar.
Triage alert ALT-2026-00099: a $75,000 wire transfer to counterparty
"Kim Jong Un" in North Korea (KP). Walk me through the screening result,
every flag that fired, and the recommended action.
Expect three flags on the response: SANCTIONS HIT (from the live API), PROHIBITED JURISDICTION: KP (from your local JURISDICTION_RISK table), and CTR THRESHOLD (amount ≥ $10K). Recommendation: halt-and-escalate.
In a second terminal, tail the MCP log so you can see each call hit OpenSanctions in real time:
# macOS
tail -f ~/Library/Logs/Claude/mcp-server-compliance-toolkit.log | grep -E "sanctions|tool_call"
You'll see your logging.info(...) lines interleaved with the JSON-RPC traffic — cache hits, API outcomes, and error fallbacks. If you used the structured log shape from the build guide (name=... source=... outcome=...), it's grep-friendly out of the box: grep "source=cache" shows everything served from cache; grep "outcome=hit" shows real matches.
3Add a local fallback~5 min
Real-world systems need to degrade gracefully when an upstream is down. Let's add a fallback to the local SDN_LIST we kept from the original tutorial.
Update lookup_sanctions_hit — in each of the three except branches, instead of returning "manual-screening-required", try the local fallback first:
def _local_fallback(name: str, reason: str) -> dict:
"""Fallback to the small embedded SDN list when the upstream API is unavailable.
NOT a substitute for the full sanctions check — clearly marks the response
as a degraded reading so investigators know to re-screen when upstream recovers.
"""
key = name.strip().lower()
if key in SDN_LIST:
hit = SDN_LIST[key]
return {
"match": True,
"name_queried": name,
"degraded_reading": True,
"fallback_reason": reason,
"list": hit["list"],
"program": hit["program"],
"added": hit["added"],
"recommended_action": "halt-and-escalate-AND-re-screen-when-upstream-recovers",
}
return {
"match": "unknown",
"name_queried": name,
"degraded_reading": True,
"fallback_reason": reason,
"screened_sources": "embedded fallback only (3 names)",
"recommended_action": "manual-screening-required",
}
Now in lookup_sanctions_hit, replace the three error returns with:
except httpx.TimeoutException:
return _local_fallback(name, "OpenSanctions API timed out after 5s")
except httpx.HTTPStatusError as e:
return _local_fallback(name, f"OpenSanctions API returned {e.response.status_code}")
except Exception as e:
return _local_fallback(name, f"Sanctions screening failed: {type(e).__name__}")
Notice the "degraded_reading": True flag. This is critical for compliance: the model — and the audit log — must know this answer came from a fallback, not the authoritative source. The recommendation explicitly says "re-screen when upstream recovers." A regulator looking at the trace sees the firm continued operating during an outage and committed to re-validating.
4Add response caching~5 min
Real APIs cost money and have rate limits. Cache identical lookups. For a session-scoped cache, just use a dict:
from time import time
_CACHE: dict[str, tuple[float, dict]] = {}
_CACHE_TTL_SECONDS = 3600 # 1 hour — adjust for compliance freshness requirements
def _cache_get(name: str) -> dict | None:
key = name.strip().lower()
if key not in _CACHE:
return None
ts, value = _CACHE[key]
if time() - ts > _CACHE_TTL_SECONDS:
del _CACHE[key]
return None
return value
def _cache_set(name: str, value: dict) -> None:
_CACHE[name.strip().lower()] = (time(), value)
Wire it into lookup_sanctions_hit — at the top, before any API call:
cached = _cache_get(name)
if cached is not None:
return {**cached, "from_cache": True}
And before each successful return, add _cache_set(name, result):
# Replace the no-match return:
no_match = {
"match": False,
"name_queried": name,
"screened_sources": "OpenSanctions aggregated (250+ sources)",
"recommended_action": "proceed-with-standard-cdd",
}
_cache_set(name, no_match)
return no_match
# And the hit return:
hit = { ... existing dict ... }
_cache_set(name, hit)
return hit
Cache too long → you miss a fresh sanctions designation. Cache too short → you blow rate limits and cost money. 1 hour is fine for live screening; 0 (no cache) for transaction-monitoring alerts. Make the TTL explicit in the audit log so a regulator can see when a cached vs fresh reading was used.
5Test the failure modes deliberately~8 min
The point of this exercise is the failure modes. Force each one:
1Force a timeout
Change the timeout in your httpx.post call from 5.0 to 0.001. Restart, run a screening. The tool will return the degraded-reading fallback. Watch how Claude handles it — usually it says "the screening was degraded, recommending manual review" and surfaces the limitation transparently.
Revert the timeout.
2Force a 4xx
Change OPENSANCTIONS_BASE to "https://api.opensanctions.org/nope". The API will return 404. Watch the fallback fire. Notice the degraded reading flag in the result, and how Claude surfaces it.
Revert.
3Test the cache
Run the same name twice in succession in Claude. Watch the tool-call detail — the second call's response includes "from_cache": true. Latency is markedly lower.
4Add observability
If you added the audit logging from the main tutorial's "break it" section, log the full path here: cache hit vs miss, API status code or exception class, fallback used or not, latency. This is the substrate for a real compliance audit log.
logging.info(
"sanctions_screening name=%s outcome=%s source=%s latency_ms=%d",
name, result.get("match"), "cache" if cached else "api", elapsed_ms,
)
What you can now say in the interview
"I wired my sanctions-screening tool to a real public API — OpenSanctions — to learn what production tool design looks like. Five patterns that matter. One, explicit timeouts on every external call so a slow upstream can't hang the agent. Two, structured error returns instead of raised exceptions, so the model sees the failure and can choose a different path. Three, a fail-safe default: when screening fails, the recommendation is always 'escalate to human,' never 'proceed' — in compliance, ambiguity defaults to more review, never less. Four, a local fallback that explicitly marks itself as a 'degraded reading' with a flag the model and the audit log can see, so a regulator knows when a fallback was used and that a re-screen was committed to. Five, a TTL'd cache to respect rate limits, but a deliberate decision about TTL — short enough that fresh sanctions designations aren't missed. Every external system you wrap in a tool needs to think through these five things, or the agent silently inherits the upstream's flakiness."