Gas & EVM Mastery
The second pillar interviewers probe. AMM math gets the headline; gas mastery is the daily reality. Every swap, every LP action, every quoter call is paid by a user — your job is to make the cost reasonable.
The gas mental model
Carry two numbers in your head:
- Cold SLOAD = 2,100 gas. Warm SLOAD = 100 gas. The single most important fact about EVM gas. Everything else is corollary.
- Cold SSTORE (zero → nonzero) = 22,100 gas. Warm SSTORE = 5,000 gas (or 100 if no change). Writes are catastrophic on the cold path.
Implications for DEX design:
- Pack hot state into one slot (v3
slot0pattern). - Read it once at the top of the function.
- Make as many fields
immutableas possible — zero SLOAD cost. - Use transient storage for per-tx state (reentrancy locks, intermediate values).
- Cache repeated reads in local variables —
uint256 r0 = reserve0;.
Opcodes that matter
| Opcode | Gas | Notes |
|---|---|---|
SLOAD | 2100 / 100 | Cold first read; warm afterwards in same tx |
SSTORE | 22100 / 5000 / 100 / refund | Depends on zero/nonzero before and after |
TLOAD (Cancun) | 100 | Transient. Always warm. |
TSTORE (Cancun) | 100 | Transient. Always cheap. |
MCOPY (Cancun) | 3 + 3·words | Memory-to-memory copy. Replaces identity precompile dance. |
CALL | 2600 / 100 + value transfer | Cold/warm address access |
EXTCODESIZE | 2600 / 100 | Same. Don't use as "isContract" — use tx.origin-aware patterns. |
EXTCODEHASH | 2600 / 100 | Used for CREATE2 address prediction |
KECCAK256 | 30 + 6·words | Cheap, but every storage mapping access is one |
BLOBHASH (Cancun) | 3 | Read versioned hash of blob in current tx |
For DEX hot paths, the dominant costs are typically: 1-2 SLOADs (slot0), 1-2 SSTOREs (updated slot0 + fee growth), token transfers (each = CALL + receiving contract logic). Optimizing past that requires Yul or architecture change.
Storage layout & packing
Solidity's auto-layout rules:
- Variables occupy slots in declaration order.
- Smaller-than-32-byte variables pack into the same slot if they fit.
- Structs follow the same rule, slot-internally.
- Mappings and dynamic arrays consume one slot for their root; the actual data lives at
keccak(key, slot).
// BAD — three slots
contract Bad {
uint128 a; // slot 0
uint256 b; // slot 1 (uint128 a can't pack with uint256 b)
uint128 c; // slot 2
}
// GOOD — two slots
contract Good {
uint128 a; // slot 0 — lower 16 bytes
uint128 c; // slot 0 — upper 16 bytes (packs!)
uint256 b; // slot 1
}
// BEST for hot reads — one slot
struct Slot0 {
uint160 sqrtPriceX96;
int24 tick;
uint16 obsIndex;
uint16 obsCardinality;
uint16 obsCardinalityNext;
uint8 feeProtocol;
bool unlocked;
} // 248 bits → 1 slotThe forge inspect <Contract> storage-layout command shows the layout. Read it for every contract you touch.
Calldata vs memory
Calldata is read-only, externally provided, and free to read (other than the 4-or-16 gas/byte transmission cost the user already paid). Memory must be allocated, written, and grows quadratically in expansion cost.
// Memory copy — wastes gas
function quote(uint256[] memory amounts) external view returns (uint256) {
return amounts[0] * 2;
}
// Calldata read — cheap
function quote(uint256[] calldata amounts) external view returns (uint256) {
return amounts[0] * 2;
}Rule of thumb: every external function with a complex parameter (struct, array, bytes) should default to calldata unless you need to mutate it.
Memory cost is roughly:
cost(words) = 3·words + words²/512
The quadratic term bites at large allocations. A 10KB memory region costs ~330 gas; a 100KB region costs ~2,300 gas per access.
Function selector ordering
Solidity's dispatcher does a linear search of selectors. Each comparison costs ~22 gas. So the 10th-listed function on your contract pays ~200 gas more on every call than the 1st.
For a router with 20 selectors where swapExactTokensForTokens is hit 90% of the time, that's free gas if you reorder.
The Solady-popularized trick: name your hot functions so their 4-byte selectors sort first. Or use via-ir with explicit function ordering hints. Or write a custom Yul dispatcher.
// Yul dispatcher sketch — binary-search selector match
object "Router" {
code {
let s := shr(224, calldataload(0))
switch s
case 0x12345678 { /* swap path */ }
case 0x9abcdef0 { /* mint path */ }
default { revert(0, 0) }
}
}Custom errors
Pre-0.8.4: require(cond, "TOO_LONG_STRING") costs 50-100 gas per character of the string, embedded in bytecode and emitted on revert.
Post-0.8.4: if (!cond) revert MyError(arg); compresses to a 4-byte selector plus encoded args. Smaller bytecode, smaller revert payload, named in clients.
// Old
require(amount > 0, "INSUFFICIENT_INPUT_AMOUNT");
// New — saves bytecode and gas on revert path
error InsufficientInput(uint256 amount);
if (amount == 0) revert InsufficientInput(amount);For DEX peripheries with dozens of revert conditions, this saves real deployment cost. Use it everywhere new.
Yul hot paths
Where assembly actually helps:
- Custom hashing. Computing position keys via Solidity is fine; via
mstore + keccak256from assembly is ~30% cheaper. - 512-bit math.
FullMath.mulDivis mostly Yul. Reading it teaches you the canonical pattern. - Tick bitmap operations. Find-next-set-bit-in-uint256 is a beautiful Yul snippet.
- Selector dispatch and ABI-encoding shortcuts when you control the call shape.
// FullMath.mulDiv — Remco Bloemen's algorithm, in Yul
function mulDiv(uint256 a, uint256 b, uint256 denominator) internal pure returns (uint256 result) {
uint256 prod0; uint256 prod1;
assembly {
let mm := mulmod(a, b, not(0))
prod0 := mul(a, b)
prod1 := sub(sub(mm, prod0), lt(mm, prod0))
}
if (prod1 == 0) { unchecked { return prod0 / denominator; } }
require(denominator > prod1, "FullMath: overflow");
// ... long-division step omitted for brevity
}Don't write mulDiv from scratch in an interview — cite it, explain why it's needed (you can compute a·b/c even when a·b overflows 256 bits), and use it.
Branchless math
Branches cost ~10 gas each on JUMPI. They also defeat some compiler optimizations. For tight inner loops (e.g. Newton iterations), branchless variants win.
// Branched min
function min(uint256 a, uint256 b) internal pure returns (uint256) {
return a < b ? a : b;
}
// Branchless min — compiles tighter
function minBL(uint256 a, uint256 b) internal pure returns (uint256 r) {
assembly { r := xor(a, mul(xor(a, b), lt(b, a))) }
}Don't reach for these reflexively. Profile first; readability second.
The "shave 20% off this function" exercise
A common live-coding round. They show you a 40-line function and ask for measurable gas wins. The reliable plays:
- Cache storage reads. Repeated
reserve0→uint256 r0 = reserve0;. - Mark unchanging vars
immutable/constant. uncheckedarithmetic where you can prove no overflow.- Combine SSTOREs — pack multiple writes into one slot, write once.
- Switch require strings to custom errors.
- Calldata-ify struct params.
- Short-circuit checks early.
- Replace external calls with
staticcallwhere applicable. - Use
++iinstead ofi++in loops (tiny but free). - Inline a one-call helper to skip the JUMP.
If you walk through the list in this order in an interview, you'll find 15-25% in most contracts.
EVM forks — what's relevant
| Fork | Year | What DEX engineers care about |
|---|---|---|
| Byzantium | 2017 | Cheap modular exponentiation precompile; static call |
| Constantinople | 2019 | CREATE2 (deterministic addresses) — vanity, factory patterns |
| Istanbul | 2019 | CHAINID, repriced SLOAD |
| Berlin | 2021 | Access lists (EIP-2929 warm/cold gas model) |
| London | 2021 | EIP-1559 base fee; affects gas auctions and MEV |
| Shanghai | 2023 | PUSH0 (1-byte zero push); withdrawals |
| Cancun | 2024 | EIP-1153 transient storage; EIP-4844 blobs; MCOPY; SELFDESTRUCT semantics change |
| Prague / Pectra | 2025 | EIP-7702 (EOA delegation); EIP-3074-equivalent; expanded precompiles |
EOAs can temporarily act as smart accounts within a transaction. This collapses Permit2 + flash flows into a single signature flow — expect router redesigns to lean on 7702 over the next year.