Section B · Technical Core

Governance & Audit

The lifecycle from "code is done" to "code is on mainnet and we're sleeping at night." Audits, governance attack surfaces, upgradeability, and incident response.

The audit cycle

A protocol-grade audit cycle for a non-trivial change runs something like:

  1. Internal review. Other senior protocol engineers on your own team read the diff. Often the most valuable single step.
  2. Targeted invariant / fuzzing campaign. Long-running runs of Echidna / Foundry invariant tests on the change. Run for days, not hours.
  3. External audit — auditor A. A reputable firm (Spearbit, Cantina, OpenZeppelin, Trail of Bits, ChainSecurity) reviews the diff over 2-6 weeks. Output: findings report.
  4. Fix & re-review. You fix the high/medium findings; auditor A re-reviews the fixes.
  5. External audit — auditor B. A second, independent audit firm reviews the (now patched) code. Two auditors catch different bug classes.
  6. Formal verification engagement. Certora or equivalent writes specs against the implementation. Often runs in parallel with audits.
  7. Public contest. Code4rena / Sherlock / Cantina hosts an open audit contest. Hundreds of eyes, capped reward pool.
  8. Bug bounty live. Immunefi or similar; on-chain code is open to ongoing bounty.
  9. Mainnet deploy. Often with TVL caps, isolated markets, or staged rollout.
  10. Ongoing monitoring & on-call. Real-time alerting; on-call rotation; incident response playbooks.
The senior framing

Audits are not a substitute for design rigor — they are the final filter on code that you already believe is correct. A protocol that ships its first draft to audit, then patches whatever comes back, is doing audits wrong. The internal-review and invariant-campaign steps catch the easy bugs at 1/100th the cost.

Multi-auditor strategy

Why two (or more) audits? Different shops have different strengths:

ShopStrength
Spearbit / CantinaSenior individual auditors; mechanism-design fluent
OpenZeppelinProcess-heavy; deep on standards and upgradeability
Trail of BitsTooling-heavy (Echidna, Slither); systems-engineering mindset
ChainSecurityStrong on formal methods; rigorous
CertoraFormal verification specialist (not a traditional audit)
Runtime VerificationK framework / Kontrol; formal background
Code4rena / Sherlock / Cantina contestsWide-net public reviews; great for surfacing unknowns
Immunefi bountyContinuous bounty for post-mainnet finds

Sequence them carefully — running two audits in parallel costs more in protocol-team bandwidth (you can't fix-and-re-review with both at once). Two sequential audits with FV in parallel is the modern norm for serious launches.

Severity rubric

You should be able to classify findings without looking at the rubric.

SeverityDefinitionExamples
CriticalDirect loss of funds or protocol insolvency, exploitableReentrancy that drains pool; oracle bug allowing free borrow
HighSignificant loss possible under specific conditionsDonation attack; griefing of liquidations; bad-debt accumulation under tail event
MediumLimited loss / griefing; or critical impact under unlikely conditionsSuboptimal rounding leaking dust; permit signature replay across chains
LowBest-practice violation, minimal direct impactMissing event emission; redundant checks; gas suggestions
InformationalCode-quality, styleNaming, comment improvements
GasOptimizationsSLOAD caching; storage layout reordering

Sherlock / Cantina / Code4rena have minor variations (e.g., Sherlock weights severity by exploitability AND impact; Code4rena uses 3-tier "H/M/QA"). Read the platform's rubric before triaging.

Audit-report triage

When the report lands, you have N findings to deal with. The process:

  1. Read every finding to the bottom. Do not skim. The interesting finding is often labeled "low" with a subtle root cause.
  2. Reproduce the issue with a test. Until you can reproduce, you don't understand the bug.
  3. Classify your response:
    • Acknowledge & fix. Straightforward.
    • Acknowledge, won't fix. The issue is real but a deliberate trade-off; document why.
    • Dispute. The finding is incorrect; explain. Auditors update or remove.
    • Out of scope. The finding is about code not in the audit scope.
  4. Write the fix; add a regression test. The test is the proof the bug doesn't come back.
  5. Submit a fix PR; auditor re-reviews. Iterate until clean.
Interview moment

If asked "tell me about an audit finding you disagreed with," the senior answer walks through: (1) what the finding said, (2) what your model was, (3) what you presented to the auditor, (4) how the conversation resolved. The interviewer is checking that you can hold a technical position respectfully against an expert.

Governance attack surfaces

Five vectors, repeated from chapter 08 with structural fixes:

AttackStructural mitigation
Flash governance (borrow tokens, vote, repay in same tx)Use voting power from a past block snapshot, not current balance
Bribery markets / vote-buyingLong-lock vote-escrow (veToken); reduce parameters set by governance
Multisig compromiseHardware signers, threshold > 50%, geographic distribution, audit signer set
Timelock bypassEvery privileged function behind the same timelock; no special "fast" admin functions
Parameter-griefingAllow-list of parameters; immutable per-market; minimize the surface
Proposal smuggling (malicious calldata)Pre-execute simulation; community-readable proposals; not abstract execute(bytes)

The deepest defense: minimize the number of things governance can do. Every privileged function is an attack surface. A core that cannot be upgraded, with markets whose parameters are immutable, has shrunk governance to the smallest possible surface.

Upgradeability trade-offs in depth

Covered structurally in chapter 06; here, the audit perspective:

  • Transparent proxy separates admin and user paths so admin calls can't accidentally fall through. Larger contract, two storage slots reserved.
  • UUPS moves upgrade logic into the implementation. Smaller proxy. Risk: an upgrade that removes the upgrade function bricks the contract. Always include the upgrade function in every new impl.
  • Beacon centralizes upgrades across N instances. Lower per-instance gas; higher blast radius.
  • Diamond (EIP-2535) for systems beyond the 24KB contract-size limit. Storage layout collisions across facets are a permanent footgun; use deterministic storage namespaces.
// EIP-7201 namespaced storage — modern pattern for upgradeable contracts
// to avoid storage collisions.
library MorphoStorage {
    /// @custom:storage-location erc7201:morpho.main
    struct MainStorage {
        mapping(Id => Market) market;
        mapping(Id => mapping(address => Position)) position;
        address owner;
        bool paused;
    }

    // keccak256(abi.encode(uint256(keccak256("morpho.main")) - 1)) & ~bytes32(uint256(0xff))
    bytes32 private constant SLOT = 0x...;

    function load() internal pure returns (MainStorage storage $) {
        assembly { $.slot := SLOT }
    }
}
The senior position

Default to immutable. If you must upgrade, default to a transparent proxy behind a long timelock with EIP-7201 storage. Reach for UUPS only when proxy size matters. Reach for diamond only when contract size forces it. Document your upgrade plan in the contract.

Emergency response

When something is wrong, you have minutes. The tooling, runbook, and people must be ready before the incident.

  1. Detection. Tenderly Alerts / OpenZeppelin Defender / Forta detectors / custom subgraph monitors. Page on-call.
  2. Triage. Is it an exploit-in-progress, a near-miss, or a false alarm? Joint call with protocol engineers + guardians.
  3. Containment. Pause if available; if not, race against the attacker.
  4. Mitigation. Upgrade impl, deploy a fixed contract, or socialize the loss.
  5. Communication. Honest, prompt, public. Post-mortem within a week.
// Minimal pause primitive
contract Pausable {
    bool public paused;
    address public guardian;

    modifier whenNotPaused() {
        if (paused) revert Paused();
        _;
    }

    function pause() external {
        if (msg.sender != guardian) revert NotGuardian();
        paused = true;
        emit Paused();
    }

    function unpause() external onlyOwner {
        paused = false;
        emit Unpaused();
    }
}

Design notes:

  • The guardian can pause but cannot move funds. The economic upside of compromising the guardian is zero.
  • Unpause requires the owner (DAO / timelocked). Avoids a guardian getting trapped in pause-mode-forever.
  • "Pause" should not interrupt liquidations of already-unhealthy positions, or borrowers cannot self-repair.
  • Some protocols expose graceful-shutdown: borrows pause, but supply / withdraw / repay / liquidate continue, so capital can leave.

Bug bounties & public contests

For mainnet protocols, an ongoing Immunefi (or similar) bounty is the norm. Standard structure:

  • Tier ladder. Critical: 10% of impacted TVL or $1M-$10M cap. High: $100k. Medium: $25k. Low: $1k.
  • Scope. Specific contract addresses, specific chains. Out-of-scope is explicit.
  • Disclosure policy. Reporter agrees not to disclose for N days while you patch.
  • Triage SLA. Reply within 24-48 hours.

Public contests (Code4rena, Sherlock, Cantina) are time-bound, prize-pool reviews. Use them for: fresh codebases, major upgrades, after audits but before mainnet. They surface bugs serious auditors don't have time for and bugs you didn't think to look for.

Post-mortem discipline

After an incident — exploit, near-miss, or "just" a deployment that didn't go as planned — write the post-mortem. Within a week. Public.

The template that works:

  1. Timeline. When did each event happen (UTC), to the minute.
  2. Root cause. What was the bug? Not "the attacker did X" — what was wrong in the code or design?
  3. Impact. Who lost what. Numbers, not adjectives.
  4. Detection. How did you find out? How long did it take?
  5. Response. What did you do?
  6. Resolution. Is the bug closed? Are users made whole?
  7. Lessons learned. What did you change in code, in process, in monitoring?
Behavioral signal

In interviews, when asked "tell me about a time something went wrong," the senior answer mirrors a post-mortem: timeline, root cause, your role, what you changed afterwards. Skip the blame; emphasize the systemic fix. This is the most distinguishing behavioral signal in protocol engineering.