Governance & Audit
The lifecycle from "code is done" to "code is on mainnet and we're sleeping at night." Audits, governance attack surfaces, upgradeability, and incident response.
The audit cycle
A protocol-grade audit cycle for a non-trivial change runs something like:
- Internal review. Other senior protocol engineers on your own team read the diff. Often the most valuable single step.
- Targeted invariant / fuzzing campaign. Long-running runs of Echidna / Foundry invariant tests on the change. Run for days, not hours.
- External audit — auditor A. A reputable firm (Spearbit, Cantina, OpenZeppelin, Trail of Bits, ChainSecurity) reviews the diff over 2-6 weeks. Output: findings report.
- Fix & re-review. You fix the high/medium findings; auditor A re-reviews the fixes.
- External audit — auditor B. A second, independent audit firm reviews the (now patched) code. Two auditors catch different bug classes.
- Formal verification engagement. Certora or equivalent writes specs against the implementation. Often runs in parallel with audits.
- Public contest. Code4rena / Sherlock / Cantina hosts an open audit contest. Hundreds of eyes, capped reward pool.
- Bug bounty live. Immunefi or similar; on-chain code is open to ongoing bounty.
- Mainnet deploy. Often with TVL caps, isolated markets, or staged rollout.
- Ongoing monitoring & on-call. Real-time alerting; on-call rotation; incident response playbooks.
Audits are not a substitute for design rigor — they are the final filter on code that you already believe is correct. A protocol that ships its first draft to audit, then patches whatever comes back, is doing audits wrong. The internal-review and invariant-campaign steps catch the easy bugs at 1/100th the cost.
Multi-auditor strategy
Why two (or more) audits? Different shops have different strengths:
| Shop | Strength |
|---|---|
| Spearbit / Cantina | Senior individual auditors; mechanism-design fluent |
| OpenZeppelin | Process-heavy; deep on standards and upgradeability |
| Trail of Bits | Tooling-heavy (Echidna, Slither); systems-engineering mindset |
| ChainSecurity | Strong on formal methods; rigorous |
| Certora | Formal verification specialist (not a traditional audit) |
| Runtime Verification | K framework / Kontrol; formal background |
| Code4rena / Sherlock / Cantina contests | Wide-net public reviews; great for surfacing unknowns |
| Immunefi bounty | Continuous bounty for post-mainnet finds |
Sequence them carefully — running two audits in parallel costs more in protocol-team bandwidth (you can't fix-and-re-review with both at once). Two sequential audits with FV in parallel is the modern norm for serious launches.
Severity rubric
You should be able to classify findings without looking at the rubric.
| Severity | Definition | Examples |
|---|---|---|
| Critical | Direct loss of funds or protocol insolvency, exploitable | Reentrancy that drains pool; oracle bug allowing free borrow |
| High | Significant loss possible under specific conditions | Donation attack; griefing of liquidations; bad-debt accumulation under tail event |
| Medium | Limited loss / griefing; or critical impact under unlikely conditions | Suboptimal rounding leaking dust; permit signature replay across chains |
| Low | Best-practice violation, minimal direct impact | Missing event emission; redundant checks; gas suggestions |
| Informational | Code-quality, style | Naming, comment improvements |
| Gas | Optimizations | SLOAD caching; storage layout reordering |
Sherlock / Cantina / Code4rena have minor variations (e.g., Sherlock weights severity by exploitability AND impact; Code4rena uses 3-tier "H/M/QA"). Read the platform's rubric before triaging.
Audit-report triage
When the report lands, you have N findings to deal with. The process:
- Read every finding to the bottom. Do not skim. The interesting finding is often labeled "low" with a subtle root cause.
- Reproduce the issue with a test. Until you can reproduce, you don't understand the bug.
- Classify your response:
- Acknowledge & fix. Straightforward.
- Acknowledge, won't fix. The issue is real but a deliberate trade-off; document why.
- Dispute. The finding is incorrect; explain. Auditors update or remove.
- Out of scope. The finding is about code not in the audit scope.
- Write the fix; add a regression test. The test is the proof the bug doesn't come back.
- Submit a fix PR; auditor re-reviews. Iterate until clean.
If asked "tell me about an audit finding you disagreed with," the senior answer walks through: (1) what the finding said, (2) what your model was, (3) what you presented to the auditor, (4) how the conversation resolved. The interviewer is checking that you can hold a technical position respectfully against an expert.
Governance attack surfaces
Five vectors, repeated from chapter 08 with structural fixes:
| Attack | Structural mitigation |
|---|---|
| Flash governance (borrow tokens, vote, repay in same tx) | Use voting power from a past block snapshot, not current balance |
| Bribery markets / vote-buying | Long-lock vote-escrow (veToken); reduce parameters set by governance |
| Multisig compromise | Hardware signers, threshold > 50%, geographic distribution, audit signer set |
| Timelock bypass | Every privileged function behind the same timelock; no special "fast" admin functions |
| Parameter-griefing | Allow-list of parameters; immutable per-market; minimize the surface |
| Proposal smuggling (malicious calldata) | Pre-execute simulation; community-readable proposals; not abstract execute(bytes) |
The deepest defense: minimize the number of things governance can do. Every privileged function is an attack surface. A core that cannot be upgraded, with markets whose parameters are immutable, has shrunk governance to the smallest possible surface.
Upgradeability trade-offs in depth
Covered structurally in chapter 06; here, the audit perspective:
- Transparent proxy separates admin and user paths so admin calls can't accidentally fall through. Larger contract, two storage slots reserved.
- UUPS moves upgrade logic into the implementation. Smaller proxy. Risk: an upgrade that removes the upgrade function bricks the contract. Always include the upgrade function in every new impl.
- Beacon centralizes upgrades across N instances. Lower per-instance gas; higher blast radius.
- Diamond (EIP-2535) for systems beyond the 24KB contract-size limit. Storage layout collisions across facets are a permanent footgun; use deterministic storage namespaces.
// EIP-7201 namespaced storage — modern pattern for upgradeable contracts
// to avoid storage collisions.
library MorphoStorage {
/// @custom:storage-location erc7201:morpho.main
struct MainStorage {
mapping(Id => Market) market;
mapping(Id => mapping(address => Position)) position;
address owner;
bool paused;
}
// keccak256(abi.encode(uint256(keccak256("morpho.main")) - 1)) & ~bytes32(uint256(0xff))
bytes32 private constant SLOT = 0x...;
function load() internal pure returns (MainStorage storage $) {
assembly { $.slot := SLOT }
}
}
Default to immutable. If you must upgrade, default to a transparent proxy behind a long timelock with EIP-7201 storage. Reach for UUPS only when proxy size matters. Reach for diamond only when contract size forces it. Document your upgrade plan in the contract.
Emergency response
When something is wrong, you have minutes. The tooling, runbook, and people must be ready before the incident.
- Detection. Tenderly Alerts / OpenZeppelin Defender / Forta detectors / custom subgraph monitors. Page on-call.
- Triage. Is it an exploit-in-progress, a near-miss, or a false alarm? Joint call with protocol engineers + guardians.
- Containment. Pause if available; if not, race against the attacker.
- Mitigation. Upgrade impl, deploy a fixed contract, or socialize the loss.
- Communication. Honest, prompt, public. Post-mortem within a week.
// Minimal pause primitive
contract Pausable {
bool public paused;
address public guardian;
modifier whenNotPaused() {
if (paused) revert Paused();
_;
}
function pause() external {
if (msg.sender != guardian) revert NotGuardian();
paused = true;
emit Paused();
}
function unpause() external onlyOwner {
paused = false;
emit Unpaused();
}
}
Design notes:
- The guardian can pause but cannot move funds. The economic upside of compromising the guardian is zero.
- Unpause requires the owner (DAO / timelocked). Avoids a guardian getting trapped in pause-mode-forever.
- "Pause" should not interrupt liquidations of already-unhealthy positions, or borrowers cannot self-repair.
- Some protocols expose graceful-shutdown: borrows pause, but supply / withdraw / repay / liquidate continue, so capital can leave.
Bug bounties & public contests
For mainnet protocols, an ongoing Immunefi (or similar) bounty is the norm. Standard structure:
- Tier ladder. Critical: 10% of impacted TVL or $1M-$10M cap. High: $100k. Medium: $25k. Low: $1k.
- Scope. Specific contract addresses, specific chains. Out-of-scope is explicit.
- Disclosure policy. Reporter agrees not to disclose for N days while you patch.
- Triage SLA. Reply within 24-48 hours.
Public contests (Code4rena, Sherlock, Cantina) are time-bound, prize-pool reviews. Use them for: fresh codebases, major upgrades, after audits but before mainnet. They surface bugs serious auditors don't have time for and bugs you didn't think to look for.
Post-mortem discipline
After an incident — exploit, near-miss, or "just" a deployment that didn't go as planned — write the post-mortem. Within a week. Public.
The template that works:
- Timeline. When did each event happen (UTC), to the minute.
- Root cause. What was the bug? Not "the attacker did X" — what was wrong in the code or design?
- Impact. Who lost what. Numbers, not adjectives.
- Detection. How did you find out? How long did it take?
- Response. What did you do?
- Resolution. Is the bug closed? Are users made whole?
- Lessons learned. What did you change in code, in process, in monitoring?
In interviews, when asked "tell me about a time something went wrong," the senior answer mirrors a post-mortem: timeline, root cause, your role, what you changed afterwards. Skip the blame; emphasize the systemic fix. This is the most distinguishing behavioral signal in protocol engineering.