Run the Server Over HTTP
Switch from stdio to Streamable HTTP. Experience both transports and answer "when do I pick which?" with conviction.
You've completed the main MCP build tutorial. You should have a working server.py in your compliance-mcp directory.
Why HTTP exists
Stdio is great for local, single-user, host-spawns-server scenarios. It's not great for:
- Multi-user — many clients connecting to one shared server
- Multi-host — one team's tools, many laptops
- Centralized state — the server holds session data, caches, locks
- Production observability — standard HTTP infra: load balancing, mTLS, request logs
- Cross-language — clients in any language can speak HTTP
MCP's Streamable HTTP transport is a single endpoint that handles both regular requests (JSON-RPC over HTTP POST) and streaming (Server-Sent Events for long-running operations). It replaces the older SSE-based pattern with a simpler one-endpoint design.
| stdio | HTTP | |
|---|---|---|
| Who runs it | Host spawns as child process | Long-lived service, separately managed |
| Auth | Process boundary | Bring your own (JWT, OAuth, mTLS) |
| Multi-user | No — one process per user | Yes — N clients share one server |
| Observability | Process logs, manual | Standard HTTP infra |
| Network surface | Zero | One port — must be secured |
| Fit for | Developer tools, single-user agents | Production compliance services |
1Run as HTTP server~5 min
FastMCP supports both transports with a single line change. First, flip the constructor at the top of server.py to enable stateless HTTP — this lets each request stand on its own without an initialize handshake and a session ID. It's the simpler mode for pure tool servers and is what makes the single-curl tests in step 4 work:
# Was: mcp = FastMCP("compliance-toolkit")
mcp = FastMCP("compliance-toolkit", stateless_http=True)
Then add a transport switch at the bottom of the file so the same script runs as either stdio or HTTP:
if __name__ == "__main__":
import sys
transport = sys.argv[1] if len(sys.argv) > 1 else "stdio"
if transport == "http":
mcp.run(transport="streamable-http")
else:
mcp.run() # stdio (default)
With the default (stateful) Streamable HTTP, every request after the first must include an Mcp-Session-Id header that the server returns from initialize. Skip the handshake and the server replies Bad Request: Missing session ID. Stateless mode (stateless_http=True) drops that requirement — each request is independent. When to use which: stateful is right when the server holds per-session state (auth tokens scoped to a session, in-flight subscriptions, long-running operations). Stateless is right for pure tool servers where every call is idempotent and self-describing — most compliance/scoring/screening tools fit this profile, and it's much easier to scale horizontally.
Now you can launch the same server in either mode:
# stdio mode (what Claude Desktop uses):
python server.py
# HTTP mode (long-lived service):
python server.py http
Run the HTTP mode. By default FastMCP serves on http://localhost:8000/mcp. You should see startup logs in your terminal.
For production you'd configure these explicitly via environment variables or FastMCP options. For learning, the defaults are fine.
2Test with the inspector~5 min
Open a second terminal (the first is running the server). Point the MCP inspector at your HTTP endpoint:
npx @modelcontextprotocol/inspector
In the inspector UI:
- Set Transport to
Streamable HTTP - Set URL to
http://localhost:8000/mcp - Click Connect
You should see the same three tools, two resources, and (if you did stretch 04b) two prompts as before. Invoke summarize_alert with the standard payload — it works identically.
The server is the same code. The protocol is the same JSON-RPC. The only difference is transport: instead of bytes flowing through pipes between a parent and child process, they're flowing through TCP between two separate processes — potentially on different machines. That's the whole conceptual difference between stdio and HTTP. Everything else is plumbing.
3Add JWT auth (the production-ready piece)~15 min
An HTTP MCP server with no auth is a wide-open door to your compliance tools. Production deployments need auth. Let's add a minimal JWT-bearer pattern.
Install PyJWT:
pip install pyjwt
Create a separate file mint_token.py for issuing test tokens:
"""Mint a JWT for testing the compliance-toolkit HTTP server.
In production this lives in a server-side mint endpoint with proper user auth,
short TTLs, and scoping. For learning, we'll mint a 1-hour token for a fake user.
"""
import jwt
from datetime import datetime, timedelta, timezone
SECRET = "dev-secret-do-not-use-in-production" # in prod: env var, KMS, etc.
def mint(user_id: str, allowed_tools: list[str], ttl_minutes: int = 60) -> str:
now = datetime.now(timezone.utc)
payload = {
"sub": user_id,
"iat": int(now.timestamp()),
"exp": int((now + timedelta(minutes=ttl_minutes)).timestamp()),
"scope": {"tools": allowed_tools},
"purpose": "compliance-toolkit-access",
}
return jwt.encode(payload, SECRET, algorithm="HS256")
if __name__ == "__main__":
# Issue a token that allows all tools
token = mint(
user_id="analyst.demo@example.com",
allowed_tools=["lookup_sanctions_hit", "check_jurisdiction_risk", "summarize_alert"],
)
print(token)
Run it once to get a token:
python mint_token.py
# eyJhbGciOiJIUzI1NiIs... (copy this)
Now add an HTTP middleware-style check inside your tools. The cleanest pattern with FastMCP is a small auth decorator:
import jwt
from contextvars import ContextVar
from functools import wraps
from mcp.server.fastmcp import Context
SECRET = "dev-secret-do-not-use-in-production"
# Threaded through the request via ContextVar so tool signatures stay clean.
# FastMCP rejects tool parameters that start with '_', so we can't pass the
# auth identity as a function kwarg. Tools read it via current_auth_user().
_current_auth_user: ContextVar[str] = ContextVar("_current_auth_user", default="anonymous")
def current_auth_user() -> str:
"""Return the authenticated subject ('sub' claim) for the in-flight tool call."""
return _current_auth_user.get()
def require_auth(tool_name: str):
"""Decorator: validate a Bearer JWT from the request headers and check scope."""
def deco(fn):
@wraps(fn)
def wrapper(*args, ctx: Context = None, **kwargs):
# FastMCP makes the request available via ctx.request_context
headers = {}
if ctx and hasattr(ctx, "request_context"):
req = ctx.request_context.request
if req:
headers = dict(req.headers)
auth = headers.get("authorization", "")
if not auth.startswith("Bearer "):
return {"error": "missing or malformed Authorization header", "isError": True}
token = auth.removeprefix("Bearer ").strip()
try:
payload = jwt.decode(token, SECRET, algorithms=["HS256"])
except jwt.ExpiredSignatureError:
return {"error": "token expired", "isError": True}
except jwt.InvalidTokenError as e:
return {"error": f"invalid token: {e}", "isError": True}
scope_tools = payload.get("scope", {}).get("tools", [])
if tool_name not in scope_tools:
return {
"error": f"token does not authorize tool '{tool_name}'",
"authorized": scope_tools,
"isError": True,
}
token_ref = _current_auth_user.set(payload["sub"])
try:
return fn(*args, **kwargs)
finally:
_current_auth_user.reset(token_ref)
return wrapper
return deco
Apply it to one tool to see the pattern. This is a two-step refactor of the lookup_sanctions_hit you already have: (1) rename the existing function — body unchanged — to a private helper _do_screening; (2) create a new public lookup_sanctions_hit that wraps the helper with @require_auth and stamps the caller's identity onto the response for the audit trail.
# Was: @mcp.tool()
# Was: def lookup_sanctions_hit(name: str) -> dict:
def _do_screening(name: str) -> dict:
"""Existing screening logic — cache, OpenSanctions API call, fallback.
No changes to the body; just renamed and the @mcp.tool() decorator removed.
"""
cached = _cache_get(name)
if cached is not None:
return {**cached, "from_cache": True}
# ... rest of your existing cache + API + fallback logic unchanged ...
# (the body you wrote in the real-API stretch)
@mcp.tool()
@require_auth("lookup_sanctions_hit")
def lookup_sanctions_hit(name: str, ctx: Context = None) -> dict:
"""Screen a person or entity name against the live OpenSanctions database.
Requires a Bearer JWT in the Authorization header with 'lookup_sanctions_hit'
in its tool scope. The caller's identity is recorded on every response.
"""
result = _do_screening(name)
result["screened_by"] = current_auth_user() # audit trail!
return result
1. The tool function must declare ctx: Context = None. FastMCP introspects the inner function's signature (via functools.wraps) to decide which framework-injected parameters to pass. If ctx isn't in the signature, FastMCP won't inject one, the wrapper's headers dict stays empty, and every call fails with "missing or malformed Authorization header" — even when you sent a valid token. Context-typed parameters are auto-injected by the framework and excluded from the JSON schema the model sees, so the model can't pass anything for ctx.
2. Don't pass auth identity via a kwarg. FastMCP rejects tool parameters whose name starts with _ at registration time (InvalidSignature: Parameter _auth_user of lookup_sanctions_hit cannot start with '_'). It also rejects passing it as a regular kwarg like auth_user because then the model sees it as a tool argument it can supply, which lets it impersonate any user. The ContextVar pattern above keeps the tool's public signature clean while still threading identity through the request.
3. Decorator order matters. @mcp.tool() must sit above @require_auth(...), so FastMCP registers the auth-wrapped callable rather than the raw one.
Test it with curl from the second terminal. The Accept header is required — MCP's Streamable HTTP transport rejects requests that don't advertise support for both JSON and SSE responses (Not Acceptable error otherwise):
# Replace TOKEN with the JWT you minted
TOKEN="eyJhbGc..."
# Should succeed with that token's scope:
curl -X POST http://localhost:8000/mcp \
-H "Authorization: Bearer $TOKEN" \
-H "Accept: application/json, text/event-stream" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"lookup_sanctions_hit","arguments":{"name":"Vladimir Petrov"}}}'
# Should fail with no token (401):
curl -X POST http://localhost:8000/mcp \
-H "Accept: application/json, text/event-stream" \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","id":1,"method":"tools/call","params":{"name":"lookup_sanctions_hit","arguments":{"name":"Vladimir Petrov"}}}'
The key idea: JWT validation happens before the tool runs. The token carries the user identity (which goes into the audit log) and the scope (which tools this token authorizes). A short TTL means a leaked token's blast radius is bounded. A rotation path on the mint side means revoking the long-lived credential invalidates derived tokens.
The SDK details for accessing request headers in FastMCP evolve quickly — if the above doesn't fit your installed version, check the official docs. The pattern (verify-before-execute, scope-per-token, user-in-audit) is universal.
4Now you can answer "stdio vs HTTP" with conviction~2 min
You've run both transports. The pros and cons are no longer abstract:
The user is on their laptop running Claude Desktop. There's no shared state. There's no need for separate auth — the OS user identity is the identity. Operations are local to the user (their files, their workspace, their session). Examples: developer tools, single-user productivity agents.
Multiple users share the server. The server hits production systems (compliance databases, case management, vendor APIs). You need centralized observability, rate limiting, secrets management. You need to enforce auth, scope, and audit. Examples: every production compliance MCP server.
What you can now say in the interview
"I've run the same MCP server in both transports — stdio for local single-user, HTTP for production-ready. The decision is straightforward once you've done it. Stdio is right when the host can spawn the server as a child process and the OS user identity is the auth — developer tools, single-user agents. HTTP is right whenever you need multiple clients, centralized state, real observability, or non-OS-level auth — every production compliance use case. The piece that matters most is auth: HTTP needs a real story, and I wired up the JWT pattern — short-lived bearer token, scope-per-token specifying which tools it authorizes, user identity in every audit log entry, and a mint endpoint that holds the long-lived credential so it never leaves the server. The blast radius of a leaked HTTP MCP token is bounded by the scope and the TTL. That's the difference between 'we exposed a tool over HTTP' and 'we exposed a tool over HTTP in a way Compliance would approve.'"