Error Handling & Failure Modes
A taxonomy of the bugs that kill lending protocols. For each: how it looks, how to detect, how to design it out.
A taxonomy of lending bugs
The post-mortems for lending protocols across the last few cycles cluster into a recognizable shape. A senior engineer treats this list as a reflex — every PR is read through it.
| Class | Symptom | Severity |
|---|---|---|
| Bad-debt accrual | Protocol becomes insolvent silently | Critical |
| Share inflation (donation) | First depositor steals from later depositors | High |
| Rounding in user favor | Small per-tx drain, eventually catastrophic | High |
| Oracle staleness | Stale price enables free borrow / wrong liquidation | Critical |
| Reentrancy in callbacks | State invariant violated during external call | Critical |
| MEV / frontrun on liquidation | Liquidators race to extract; protocol still ok but bad UX | Medium |
| IRM extremes | Rate overflow / underflow at U near 0 or 100% | High |
| Governance attack | Voter manipulation; timelock-bypass; flash governance | Critical |
| ERC-20 quirks | USDT (no return value), fee-on-transfer, rebasing | High |
| Read-only reentrancy | Stale view fools an external integrator | High |
Bad-debt accrual
What it looks like: a position falls underwater faster than liquidators can act; debt is greater than collateral × LLTV. The protocol now has more borrowed than backed.
Causes:
- Oracle price gap (a discrete jump exceeds the safety margin between max borrow LTV and LLTV).
- Collateral becomes illiquid (no one will repay debt to receive seizable collateral).
- Liquidator MEV / gas spike makes liquidation unprofitable.
- Oracle outage prevents liquidation.
Detection: off-chain monitoring of every position's health; an invariant test asserting sum-of-debts ≤ sum-of-collateral-values.
Design-out:
- Choose a max borrow LTV strictly less than LLTV — the gap is the safety margin for price moves between updates.
- Add pre-liquidations to start closing positions before LLTV.
- Socialize bad debt immediately when collateral hits 0 (don't let it accumulate silently).
- For high-vol collaterals, prefer Dutch-auction liquidations that adapt the incentive.
Share-inflation / the "donation" attack
A famous ERC-4626 / Compound v2 class of bug.
What it looks like:
- Vault is empty (totalAssets = 0, totalShares = 0).
- Attacker deposits 1 wei. Mints 1 share.
- Attacker transfers (donates) 1e18 of the underlying directly to the vault. totalAssets is now 1e18 + 1, totalShares is still 1.
- Victim deposits 1.5e18. With
shares = assets × totalShares / totalAssets, victim gets1.5e18 × 1 / (1e18 + 1) = 1share (rounded down). - Victim's deposit is split between attacker (now holds 50% of shares) and the donation pool. Attacker withdraws their 1 share and walks away with ~0.5x the victim's deposit.
// Mitigation 1: virtual shares (OpenZeppelin v5)
function _convertToShares(uint256 assets, Math.Rounding rounding) internal view returns (uint256) {
return assets.mulDiv(totalSupply() + 10 ** _decimalsOffset(), totalAssets() + 1, rounding);
}
// Mitigation 2: dead shares — mint a chunk of shares to address(0) on first deposit
function _deposit(uint256 assets, address receiver) internal {
uint256 shares = previewDeposit(assets);
if (totalSupply() == 0) {
// Send DEAD_SHARES to address(0), the rest to receiver
_mint(address(0), DEAD_SHARES);
shares -= DEAD_SHARES;
}
_mint(receiver, shares);
}
// Mitigation 3 (Morpho Blue): SHARES_OFFSET — shares are 1e6× the assets at parity
uint256 public constant SHARES_OFFSET = 1e6;
// First supply: 1 wei assets => 1e6 shares. Donating 1 wei doesn't move the ratio meaningfully.
Any time you see shares = assets × totalShares / totalAssets, you should automatically check: what happens on first deposit? What if totalAssets is donated? If your mental model doesn't immediately give you the answer, you have a bug.
Rounding direction
Every share / asset conversion rounds some way. The rule is: always in the protocol's favor.
| Operation | Round shares | Round assets |
|---|---|---|
| Deposit (assets→shares) | Down | — |
| Withdraw (shares→assets) | — | Down |
| Mint (shares→assets) | — | Up |
| Redeem (assets→shares) | Up | — |
| Borrow (assets→shares) | Up | — |
| Repay (shares→assets) | — | Up |
Mnemonic: "users always pay one wei more than they should; protocol always receives one wei more than it should." The result is monotonic and impossible to game.
Oracle staleness
Covered in depth in chapter 05; key bug class summary:
- No staleness check: oracle returns a value but it's hours old. Borrower opens an undercollateralized position. Liquidatable when oracle updates, but protocol already lost.
- Only check
updatedAt, notanswer: some feeds return 0 on error. Always checkanswer > 0andupdatedAt > block.timestamp - MAX_STALENESS. - Ignoring L2 sequencer: on L2, sequencer outages stall the price feed. Use Chainlink's L2 Sequencer Uptime feed and apply a grace period.
- Trusting confidence intervals blindly: on Pyth, a wide
confmeans the publisher network is uncertain. Reject prices withconf / price > threshold.
Reentrancy in callbacks
Most modern lending bugs in this class are not classic reentrancy — those are caught by even basic review. The subtler ones:
- Read-only reentrancy. Function A is mid-execution; it calls out to user X; user X calls a view function on another protocol that reads this contract's state via a getter. The getter returns stale state.
- Cross-function reentrancy. Function A makes an external call. The recipient calls function B, which observes A's not-yet-committed state.
- Callback-with-invariant-after. The pattern is sound when invariants are checked after the callback. The bug is when one branch (e.g., a fallback path) skips the post-callback invariant check.
// VULNERABLE — view returns mid-update state during a callback
contract Vault {
uint256 public totalAssets;
function withdraw(uint256 amount, bytes calldata data) external nonReentrant {
totalAssets -= amount; // updated
asset.safeTransfer(msg.sender, amount);
if (data.length != 0) ICallback(msg.sender).onWithdraw(data);
// What if msg.sender is itself called by another protocol that reads totalAssets here?
// Actually fine in this case — totalAssets is committed before the call.
}
function pricePerShare() external view returns (uint256) {
return totalAssets * 1e18 / totalShares; // could be read mid-callback
}
}
The senior fix: any view function that other protocols might read as an oracle should either (a) revert during the callback window via a transient lock, or (b) be guaranteed-consistent.
MEV & frontrunning of liquidations
Not strictly a bug — but a systemic design concern. When a position becomes liquidatable, every searcher in the mempool races. The most common effects:
- The liquidation incentive is competed away in priority fees; borrowers get punished, but searchers and validators capture the rents.
- Sandwich attacks on the price update that makes a position liquidatable.
- "Just-in-time" liquidity around oracles to maximize an arbitrage on a stale price.
Mitigations:
- Dutch auctions price-discover the LI; the winning searcher pays exactly what they need to.
- Pre-liquidations move part of the close to a curated keeper, off the public mempool.
- Use commit-reveal or batch-auction primitives for price-sensitive transitions.
- For borrowers, design pre-liquidations that opt-in to a smaller, predictable haircut.
IRM math at extreme utilization
Common failure modes:
- U = 100%: a divide-by-zero if your formula divides by
1 - U. Cap U or guard the divide. - U > 100% (briefly during accrual): some accumulator updates can temporarily push U over 1 if borrow grows faster than supply. The IRM must not revert or produce nonsense in that window.
- Rate overflow. An adaptive IRM that ratchets up unboundedly will eventually overflow. Cap MAX_BORROW_RATE and clamp.
- Rate underflow / zero rate. Some PID controllers can drive rate to zero or negative if utilization is very low for very long. Floor at MIN_BORROW_RATE.
- Compounding precision. Discrete compounding over short intervals with very high rates can produce noticeable rounding drift. Use Taylor expansions or rpow-style integer compounding with care.
// rpow — exponentiate a per-second rate over dt seconds using exponentiation-by-squaring
// MakerDAO-style. Be very careful with overflow on intermediates.
function rpow(uint256 x, uint256 n, uint256 base) internal pure returns (uint256 z) {
assembly {
switch x case 0 { switch n case 0 { z := base } default { z := 0 } }
default {
switch mod(n, 2) case 0 { z := base } default { z := x }
let half := div(base, 2)
for { n := div(n, 2) } n { n := div(n, 2) } {
let xx := mul(x, x)
if iszero(eq(div(xx, x), x)) { revert(0, 0) }
let xxRound := add(xx, half)
if lt(xxRound, xx) { revert(0, 0) }
x := div(xxRound, base)
if mod(n, 2) {
let zx := mul(z, x)
if and(iszero(iszero(x)), iszero(eq(div(zx, x), z))) { revert(0, 0) }
let zxRound := add(zx, half)
if lt(zxRound, zx) { revert(0, 0) }
z := div(zxRound, base)
}
}
}
}
}
Governance attacks
Five flavors:
- Flash governance. Attacker borrows governance tokens, votes, executes a malicious proposal in one tx. Mitigation: snapshot at a past block, not at execution time.
- Timelock bypass. A proposal that calls a privileged function which is not behind the timelock. Mitigation: every privileged function is behind the same timelock.
- Voter bribery. Bribe pools (e.g., Hidden Hand-style markets) let attackers rent voting power. Mitigation: long lock-ups, vote-escrowed tokens, or move risk decisions out of governance.
- Multisig compromise. The "guardian" or "owner" multisig is the highest-value target. Mitigation: hardware-backed signers, geographic separation, social-recovery.
- Parameter-griefing. A proposal sets LLTV / IRM to malicious values. Mitigation: parameter allow-lists; immutable per-market parameters.
ERC-20 quirks
| Quirk | Token | Failure mode | Fix |
|---|---|---|---|
| No return value | USDT | transfer().returns(bool) reverts on USDT | Use SafeERC20 / safeTransfer |
| Fee-on-transfer | Some meme tokens, PAXG | Received amount < sent amount | Measure balance before/after; reject or handle |
| Rebasing | stETH, AMPL | Balance changes without transfer | Use wrapped variant (wstETH); record shares |
| Blocklist / pausable | USDC, USDT | Transfer reverts if sender/receiver is blocked | Have a withdrawal recovery path |
| Approve race | Old ERC-20 | front-run an approve | Use increaseAllowance / Permit2 |
| Permit | EIP-2612 tokens | Frontrun-able permit signatures grief approvals | Use try/catch around permit |
| Decimals weirdness | USDC (6), WBTC (8) | Hardcoded 18 decimals → 1e12 errors | Read decimals() or use ORACLE_SCALE |
A pre-merge checklist
Read every PR through this filter
- Does every external entry call
accrueInterestfirst? - Does every share/asset conversion round in protocol favor?
- Does the health check happen after any callback?
- Are oracle reads guarded by staleness + positivity?
- Are all external transfers wrapped in SafeERC20?
- Are state mutations done before external calls (CEI)?
- Are events emitted for every state change (for indexers / audit trail)?
- Do custom errors include the offending values (for debugging)?
- Do new functions live behind the same authorization gates as siblings?
- Are there fuzz / invariant tests covering the new state transitions?
- Was
forge inspect storagediffed against main? - Was
forge snapshotdiffed against main?