Data & Pipelines
On-chain data is half the user-visible product. Subgraphs, indexers, event design, and the monitoring stack a senior DEX engineer designs alongside the contracts themselves.
Why smart-contract engineers care about data
Three reasons:
- Event design is irreversible. Once shipped on immutable core, events are forever. Bad event design means bad indexing means bad UX.
- You will be on call for monitoring incidents. Knowing the data pipeline matters when something looks wrong at 3am.
- Front-ends and aggregators consume your event schema. Backwards-incompatible event changes break the ecosystem.
Subgraph design
The Graph is the canonical indexing layer. A subgraph defines entities (Pool, Position, Swap, etc.) and event handlers that mutate them.
Entity shape for a typical DEX subgraph:
# schema.graphql
type Factory @entity {
id: ID!
poolCount: BigInt!
totalVolumeUSD: BigDecimal!
}
type Pool @entity {
id: ID! # pool address
token0: Token!
token1: Token!
feeTier: BigInt!
liquidity: BigInt!
sqrtPrice: BigInt!
tick: BigInt
volumeUSD: BigDecimal!
totalValueLockedUSD: BigDecimal!
feesUSD: BigDecimal!
}
type Position @entity {
id: ID! # tokenId or composite
owner: Bytes!
pool: Pool!
tickLower: BigInt!
tickUpper: BigInt!
liquidity: BigInt!
collectedFeesToken0: BigDecimal!
collectedFeesToken1: BigDecimal!
}
type Swap @entity {
id: ID! # tx hash + log index
pool: Pool!
sender: Bytes!
recipient: Bytes!
amount0: BigDecimal!
amount1: BigDecimal!
amountUSD: BigDecimal!
sqrtPriceX96: BigInt!
tick: BigInt!
timestamp: BigInt!
}Design rules:
- Entity IDs must be deterministic. Don't use sequential counters — they break parallel indexers. Use addresses, txhash+logIndex, or composite keys.
- Denormalize for query speed. Store USD values at write time; don't compute at read time.
- Snapshot at intervals. Hourly and daily aggregates as separate entities. Don't query 1M swaps to compute volume.
- Handle reorgs. The Graph does this for you up to a depth; don't make state changes the subgraph can't unwind.
Event design at the core
Every state change on a core contract should emit an event. The schema should:
- Index the searchable fields.
indexedon user addresses, token addresses, pool IDs. - Pack the rest as data. Up to 3 indexed args; the rest goes in data.
- Carry derived values when cheap. Emit the new sqrtPriceX96 and tick on every swap so indexers don't have to recompute.
- Be backwards-compatible across upgrades. Once shipped, don't change. Add new events; don't modify old ones.
// Canonical v3 swap event — note the careful index choice and the post-swap state
event Swap(
address indexed sender,
address indexed recipient,
int256 amount0,
int256 amount1,
uint160 sqrtPriceX96, // POST-swap
uint128 liquidity, // POST-swap
int24 tick // POST-swap
);Why post-swap state? Because a subgraph that consumes this event in order can reconstruct the entire state of the pool from genesis without ever reading storage.
Volume / TVL / fee accounting
The three numbers every DEX dashboard shows:
| Metric | How it's actually computed |
|---|---|
| Volume | Sum of abs(amount0) (or amount1) on Swap events, converted to USD. Done daily/hourly. |
| TVL | Sum of reserves across all pools × USD price per token. v3 needs LP positions aggregated; v2 just reads token.balanceOf(pool). |
| Fees | Volume × fee tier, minus protocol fee. Per-pool. |
Pricing tokens is its own problem. Strategies:
- Find a path to USDC/USDT/DAI on the same chain; quote through that.
- Use a TWAP from the highest-TVL pool involving the token.
- Whitelist a stable set; price everything by routes to that set.
- For long-tail tokens — accept that USD valuation is fuzzy.
Off-chain monitoring
What you actually watch in prod:
- Position liquidity drift. If a known whale's position changes outside expected windows, alert.
- Abnormal slippage events. A swap that consumed 10× the expected slippage suggests a thin pool or an attack.
- Fee accumulation health. Fees should grow roughly with volume. A divergence means math is broken or an integrator is gaming.
- Protocol-fee invariants. Treasury accruals match expected % of volume.
- Oracle staleness. Last TWAP update vs current time. If observations stop, the oracle has frozen.
- Hook failures (v4). Reverts inside hooks have downstream effects — alert.
- L1 ↔ L2 deployment parity. Bytecode and selector tables should match across chains for the canonical deployments.
Tools: Tenderly Alerts, OpenZeppelin Defender, Hypernative, Forta, plus in-house Prometheus scrapers fed by a custom indexer.
Aggregator integration
Aggregators consume your contracts. They expect:
- Reliable quoter contracts. Off-chain pricing requires a view function that returns the exact swap result without execution.
- Stable function signatures. The aggregator's integration breaks the day you ship a new selector.
- Callback-friendly interfaces. Most aggregators call core directly via callbacks; periphery is bypassed.
- Subgraph or REST availability. They list-and-rank pools.
| Aggregator | Integration shape |
|---|---|
| 1inch | Pathfinder; off-chain routing; on-chain settler. Direct pool calls. |
| 0x | RFQ + AMM hybrid. Settler contract per chain. |
| ParaSwap | Adapters per DEX type; routes can split across many pools. |
| KyberSwap | Meta-aggregator with own AMM pools as fallback. |
| CoW Swap | Batch auction; solvers compete; settles via Vault contract. |
| Odos | SOR (smart order router); split path optimization. |
| LiFi / Socket / Squid | Cross-chain DEX aggregators; consume yours per-chain. |
Dune / Flipside / on-chain SQL
Mature engineers can write basic Dune queries to investigate incidents. Example: find the top 10 swappers in the last 24h:
-- Dune SQL (Trino dialect)
SELECT
"from" AS swapper,
SUM(amount_usd) AS total_volume_usd
FROM uniswap_v3_ethereum.Pool_evt_Swap s
JOIN ethereum.transactions t ON s.evt_tx_hash = t.hash
WHERE s.evt_block_time >= now() - interval '24' hour
GROUP BY 1
ORDER BY 2 DESC
LIMIT 10You're not expected to be a data engineer. But knowing how to pull a quick sanity check from chain data is a senior signal.
Observability checklist
For a new core deployment, ensure all of:
- Subgraph spec drafted alongside contract spec.
- Events designed before the first PR.
- Indexer running on testnet before mainnet deploy.
- Volume + TVL dashboards available at launch.
- Alert rules wired to PagerDuty / Slack.
- Anomaly thresholds set conservatively for the first month.
- Runbooks written for the top 5 expected alert types.
When asked "how would you launch a new core release," a complete answer includes the data and monitoring pipeline, not just the contracts. Most candidates omit it.