Section B · Technical Core · Secondary

Gas & EVM Mastery

The second pillar interviewers probe. AMM math gets the headline; gas mastery is the daily reality. Every swap, every LP action, every quoter call is paid by a user — your job is to make the cost reasonable.

The gas mental model

Carry two numbers in your head:

  • Cold SLOAD = 2,100 gas. Warm SLOAD = 100 gas. The single most important fact about EVM gas. Everything else is corollary.
  • Cold SSTORE (zero → nonzero) = 22,100 gas. Warm SSTORE = 5,000 gas (or 100 if no change). Writes are catastrophic on the cold path.

Implications for DEX design:

  • Pack hot state into one slot (v3 slot0 pattern).
  • Read it once at the top of the function.
  • Make as many fields immutable as possible — zero SLOAD cost.
  • Use transient storage for per-tx state (reentrancy locks, intermediate values).
  • Cache repeated reads in local variables — uint256 r0 = reserve0;.

Opcodes that matter

OpcodeGasNotes
SLOAD2100 / 100Cold first read; warm afterwards in same tx
SSTORE22100 / 5000 / 100 / refundDepends on zero/nonzero before and after
TLOAD (Cancun)100Transient. Always warm.
TSTORE (Cancun)100Transient. Always cheap.
MCOPY (Cancun)3 + 3·wordsMemory-to-memory copy. Replaces identity precompile dance.
CALL2600 / 100 + value transferCold/warm address access
EXTCODESIZE2600 / 100Same. Don't use as "isContract" — use tx.origin-aware patterns.
EXTCODEHASH2600 / 100Used for CREATE2 address prediction
KECCAK25630 + 6·wordsCheap, but every storage mapping access is one
BLOBHASH (Cancun)3Read versioned hash of blob in current tx

For DEX hot paths, the dominant costs are typically: 1-2 SLOADs (slot0), 1-2 SSTOREs (updated slot0 + fee growth), token transfers (each = CALL + receiving contract logic). Optimizing past that requires Yul or architecture change.

Storage layout & packing

Solidity's auto-layout rules:

  • Variables occupy slots in declaration order.
  • Smaller-than-32-byte variables pack into the same slot if they fit.
  • Structs follow the same rule, slot-internally.
  • Mappings and dynamic arrays consume one slot for their root; the actual data lives at keccak(key, slot).
// BAD — three slots
contract Bad {
    uint128 a;   // slot 0
    uint256 b;   // slot 1  (uint128 a can't pack with uint256 b)
    uint128 c;   // slot 2
}

// GOOD — two slots
contract Good {
    uint128 a;   // slot 0 — lower 16 bytes
    uint128 c;   // slot 0 — upper 16 bytes (packs!)
    uint256 b;   // slot 1
}

// BEST for hot reads — one slot
struct Slot0 {
    uint160 sqrtPriceX96;
    int24   tick;
    uint16  obsIndex;
    uint16  obsCardinality;
    uint16  obsCardinalityNext;
    uint8   feeProtocol;
    bool    unlocked;
} // 248 bits → 1 slot

The forge inspect <Contract> storage-layout command shows the layout. Read it for every contract you touch.

Calldata vs memory

Calldata is read-only, externally provided, and free to read (other than the 4-or-16 gas/byte transmission cost the user already paid). Memory must be allocated, written, and grows quadratically in expansion cost.

// Memory copy — wastes gas
function quote(uint256[] memory amounts) external view returns (uint256) {
    return amounts[0] * 2;
}

// Calldata read — cheap
function quote(uint256[] calldata amounts) external view returns (uint256) {
    return amounts[0] * 2;
}

Rule of thumb: every external function with a complex parameter (struct, array, bytes) should default to calldata unless you need to mutate it.

Memory cost is roughly:

cost(words) = 3·words + words²/512

The quadratic term bites at large allocations. A 10KB memory region costs ~330 gas; a 100KB region costs ~2,300 gas per access.

Function selector ordering

Solidity's dispatcher does a linear search of selectors. Each comparison costs ~22 gas. So the 10th-listed function on your contract pays ~200 gas more on every call than the 1st.

For a router with 20 selectors where swapExactTokensForTokens is hit 90% of the time, that's free gas if you reorder.

The Solady-popularized trick: name your hot functions so their 4-byte selectors sort first. Or use via-ir with explicit function ordering hints. Or write a custom Yul dispatcher.

// Yul dispatcher sketch — binary-search selector match
object "Router" {
  code {
    let s := shr(224, calldataload(0))
    switch s
    case 0x12345678 { /* swap path */ }
    case 0x9abcdef0 { /* mint path */ }
    default { revert(0, 0) }
  }
}

Custom errors

Pre-0.8.4: require(cond, "TOO_LONG_STRING") costs 50-100 gas per character of the string, embedded in bytecode and emitted on revert.

Post-0.8.4: if (!cond) revert MyError(arg); compresses to a 4-byte selector plus encoded args. Smaller bytecode, smaller revert payload, named in clients.

// Old
require(amount > 0, "INSUFFICIENT_INPUT_AMOUNT");

// New — saves bytecode and gas on revert path
error InsufficientInput(uint256 amount);
if (amount == 0) revert InsufficientInput(amount);

For DEX peripheries with dozens of revert conditions, this saves real deployment cost. Use it everywhere new.

Yul hot paths

Where assembly actually helps:

  1. Custom hashing. Computing position keys via Solidity is fine; via mstore + keccak256 from assembly is ~30% cheaper.
  2. 512-bit math. FullMath.mulDiv is mostly Yul. Reading it teaches you the canonical pattern.
  3. Tick bitmap operations. Find-next-set-bit-in-uint256 is a beautiful Yul snippet.
  4. Selector dispatch and ABI-encoding shortcuts when you control the call shape.
// FullMath.mulDiv — Remco Bloemen's algorithm, in Yul
function mulDiv(uint256 a, uint256 b, uint256 denominator) internal pure returns (uint256 result) {
    uint256 prod0; uint256 prod1;
    assembly {
        let mm := mulmod(a, b, not(0))
        prod0 := mul(a, b)
        prod1 := sub(sub(mm, prod0), lt(mm, prod0))
    }
    if (prod1 == 0) { unchecked { return prod0 / denominator; } }
    require(denominator > prod1, "FullMath: overflow");
    // ... long-division step omitted for brevity
}

Don't write mulDiv from scratch in an interview — cite it, explain why it's needed (you can compute a·b/c even when a·b overflows 256 bits), and use it.

Branchless math

Branches cost ~10 gas each on JUMPI. They also defeat some compiler optimizations. For tight inner loops (e.g. Newton iterations), branchless variants win.

// Branched min
function min(uint256 a, uint256 b) internal pure returns (uint256) {
    return a < b ? a : b;
}

// Branchless min — compiles tighter
function minBL(uint256 a, uint256 b) internal pure returns (uint256 r) {
    assembly { r := xor(a, mul(xor(a, b), lt(b, a))) }
}

Don't reach for these reflexively. Profile first; readability second.

The "shave 20% off this function" exercise

A common live-coding round. They show you a 40-line function and ask for measurable gas wins. The reliable plays:

  1. Cache storage reads. Repeated reserve0uint256 r0 = reserve0;.
  2. Mark unchanging vars immutable / constant.
  3. unchecked arithmetic where you can prove no overflow.
  4. Combine SSTOREs — pack multiple writes into one slot, write once.
  5. Switch require strings to custom errors.
  6. Calldata-ify struct params.
  7. Short-circuit checks early.
  8. Replace external calls with staticcall where applicable.
  9. Use ++i instead of i++ in loops (tiny but free).
  10. Inline a one-call helper to skip the JUMP.

If you walk through the list in this order in an interview, you'll find 15-25% in most contracts.

EVM forks — what's relevant

ForkYearWhat DEX engineers care about
Byzantium2017Cheap modular exponentiation precompile; static call
Constantinople2019CREATE2 (deterministic addresses) — vanity, factory patterns
Istanbul2019CHAINID, repriced SLOAD
Berlin2021Access lists (EIP-2929 warm/cold gas model)
London2021EIP-1559 base fee; affects gas auctions and MEV
Shanghai2023PUSH0 (1-byte zero push); withdrawals
Cancun2024EIP-1153 transient storage; EIP-4844 blobs; MCOPY; SELFDESTRUCT semantics change
Prague / Pectra2025EIP-7702 (EOA delegation); EIP-3074-equivalent; expanded precompiles
DEX implications of Pectra (EIP-7702)

EOAs can temporarily act as smart accounts within a transaction. This collapses Permit2 + flash flows into a single signature flow — expect router redesigns to lean on 7702 over the next year.