Data & Pipelines
On-chain data flows, indexers, off-chain monitoring, and where the protocol stops and the platform around it begins.
On-chain data flows
Protocol operators consume on-chain data through three primary channels:
- Events. Emitted by contracts; cheap to read; queryable via JSON-RPC
eth_getLogsor via a subgraph / indexer. - State reads.
eth_callagainst the latest (or pinned) block; expensive at scale. - Trace reconstruction. Re-execute past blocks for fine-grained call-level data; very expensive, requires a full archive node.
Senior reflex: every state change in your protocol must emit an event with all relevant fields. The cost is ~375 gas per topic + 8 gas per byte of data. Skip events and your indexer needs to read storage at every block, which is orders of magnitude worse.
event Supply(
Id indexed id,
address indexed caller,
address indexed onBehalf,
uint256 assets,
uint256 shares
);
// Indexed parameters become topics — searchable by EQ but limited to 3 indexed fields
// Non-indexed parameters are ABI-encoded into the data field.
emit Supply(id, msg.sender, onBehalf, assets, shares);
Subgraph design
A subgraph (The Graph) is a hosted, schema-driven indexer. You write three things: schema, manifest (which contracts and events to track), and mappings (AssemblyScript handlers that translate events to entities).
# schema.graphql
type Market @entity {
id: Bytes! # market id
loanToken: Bytes!
collateralToken: Bytes!
oracle: Bytes!
irm: Bytes!
lltv: BigInt!
totalSupplyAssets: BigInt!
totalBorrowAssets: BigInt!
positions: [Position!]! @derivedFrom(field: "market")
}
type Position @entity {
id: ID! # market.id + "-" + user
market: Market!
user: Bytes!
supplyShares: BigInt!
borrowShares: BigInt!
collateral: BigInt!
}
type Liquidation @entity(immutable: true) {
id: Bytes!
market: Market!
borrower: Bytes!
liquidator: Bytes!
seizedCollateral: BigInt!
repaidAssets: BigInt!
timestamp: BigInt!
}
# mapping (AssemblyScript)
export function handleSupply(event: SupplyEvent): void {
let market = Market.load(event.params.id);
if (market == null) return;
market.totalSupplyAssets = market.totalSupplyAssets.plus(event.params.assets);
market.save();
let posId = event.params.id.toHexString() + "-" + event.params.onBehalf.toHexString();
let pos = Position.load(posId);
if (pos == null) {
pos = new Position(posId);
pos.market = event.params.id;
pos.user = event.params.onBehalf;
pos.supplyShares = BigInt.zero();
pos.borrowShares = BigInt.zero();
pos.collateral = BigInt.zero();
}
pos.supplyShares = pos.supplyShares.plus(event.params.shares);
pos.save();
}
Things senior engineers think about:
- Derived fields (
@derivedFrom) save storage but cost query time. Use for one-to-many relationships. - Immutable entities (
@entity(immutable: true)) are cheaper to write — use for append-only data like Liquidations, Trades. - BigInt / BigDecimal are unavoidable; never use number for token values.
- Reorg handling is built in for subgraphs, but write idempotent handlers (don't add to derived fields that the subgraph already maintains).
- Indexing performance. Avoid contract
viewcalls inside mappings if possible — they slow indexing dramatically. Prefer emitting all needed data in events.
Indexers — the modern landscape
| Tool | Shape | Strength | Weakness |
|---|---|---|---|
| The Graph (subgraph) | Hosted, AssemblyScript mappings | Battle-tested; large ecosystem | Slower indexing; cost on hosted plan |
| Goldsky | Hosted; subgraph + custom pipelines | Fast indexing; mirror to Postgres/S3 | Vendor lock-in for advanced features |
| Envio | TypeScript/ReScript handlers; fast | Sub-second indexing; multi-chain native | Newer; smaller community |
| Ponder | TypeScript-first, dev-friendly | Modern DX; type-safe schema | Self-host or hosted |
| Custom indexer | Roll your own from eth_getLogs | Full control | Reorg handling, scaling, retries are your problem |
Default to a hosted subgraph (The Graph or Goldsky) for protocol-wide analytics. Reach for Envio / Ponder when you need sub-second freshness (liquidation alerting). Build custom only when your data needs are exotic (re-traces, specific call-level data).
Off-chain monitoring
The on-call rotation needs eyes on the protocol at all times. Real-time monitoring tools:
| Tool | What it does |
|---|---|
| Tenderly | Tx simulation, alerts on events / function calls, mempool monitoring, debug traces |
| OpenZeppelin Defender | Sentinels (alerts), Autotasks (scheduled scripts), Relayers (signed-tx submission) |
| Phalcon (BlockSec) | Tx replay, security monitoring, exploit detection |
| Forta | Distributed detection bots; community + custom |
| Dune / Flipside | SQL on on-chain data; ad-hoc analytics dashboards |
| Custom (Subgraph + Slack) | Webhook on subgraph events; cheap and reliable |
Typical alerts for a lending protocol:
- Any liquidation event > $X notional → page on-call.
- Any market crossing 95% utilization → page risk.
- Oracle staleness > threshold → page protocol.
- Any pause-modifier triggered → page everyone.
- Any guardian-key signing event → notify security.
- Any deviation between subgraph state and on-chain state > epsilon → page indexer ops.
Position-health monitoring
For liquidations, the protocol team typically runs (or relies on) services that:
- Index every position and its current health factor.
- Subscribe to oracle price updates.
- For each price update, recompute the set of "now liquidatable" positions.
- Submit liquidation transactions, optionally via a flash loan.
// Pseudocode for a liquidation bot loop
async function loop() {
const positions = await subgraph.getOpenPositions();
oracleClient.on("PriceUpdate", async (market, newPrice) => {
const candidates = positions
.filter(p => p.market === market)
.filter(p => healthFactor(p, newPrice) < 1.0)
.sort((a, b) => expectedProfit(b, newPrice) - expectedProfit(a, newPrice));
for (const p of candidates) {
const seize = optimalSeize(p, newPrice);
const tx = await liquidator.populateTransaction.liquidate(p.borrower, seize, p.market);
await flashbots.sendBundle([tx]); // private mempool
}
});
}
Most protocol teams do not run the liquidator bots themselves — third-party MEV searchers do that. But the protocol team should monitor whether liquidations are happening promptly. If positions remain underwater for blocks without being liquidated, the liquidation incentive may be too low or the collateral too illiquid.
Oracle & liquidation monitoring
Specific dashboards the protocol team checks daily:
- Oracle freshness. Heartbeat time since last update per feed. Heat-mapped.
- Oracle deviation. Price delta between primary feed and a cross-check (e.g., TWAP, secondary).
- L2 sequencer status. On Arbitrum, Optimism, Base — uptime via the chainlink sequencer-uptime feed.
- Liquidation latency. Time between a position becoming unhealthy (oracle update) and the first liquidation tx landing.
- Bad-debt watermark. Cumulative bad debt socialized across markets. Should be near-zero.
- Utilization heat-map. Each market's utilization, color-coded.
- Per-market PnL. Treasury fee accrual minus losses.
Where the off-chain stack meets on-chain reality
A few subtleties the senior engineer carries in their head:
- Subgraph lag. Subgraphs are typically 1-3 blocks behind tip. For UI display this is fine; for liquidator decisions it is not. Use direct RPC reads for time-sensitive decisions.
- Reorgs. An event you saw 2 blocks ago might be re-orged out. Subgraphs handle this transparently but custom indexers must wait for finality (often 12+ blocks on L1) before acting irreversibly.
- Cross-chain consistency. A multi-chain protocol's state is N blockchains' states, asynchronous by design. Aggregate dashboards must clearly attribute per-chain.
- RPC node trust. A compromised RPC provider can serve stale or fake data. For critical systems, query multiple providers and require quorum.
- Tx submission. Public mempool is a goldfish bowl — every searcher sees your bundle. Private relays (Flashbots, MEV-Share, Beaverbuild) cost nothing extra and protect against frontrunning.
If asked "how would you monitor your protocol?" the senior answer has three layers: (1) events flow into an indexer, (2) dashboards/alerts on top of that indexer, (3) parallel direct RPC reads for time-critical checks. Acknowledge that the indexer can lag, and have a fallback. That answer is right for ~90% of protocols.