Geth(15) QA - Kehao Zheng's Website

Q1: What is the complete lifecycle of a transaction from user submission to finalization?#

Overall flow#

1
User / DApp
2
     │
3
     │  eth_sendRawTransaction (JSON-RPC)
4
     ▼
5
┌─────────────┐
6
│ RPC Server  │  ← Ch.13: transport, dispatch, reflection
7
└──────┬──────┘
8
       │
9
       ▼
10
┌─────────────┐
11
│TransactionAPI│  ← Ch.13: decode raw bytes → types.Transaction
12
└──────┬──────┘
13
       │  txPool.Add()
14
       ▼
15
┌─────────────┐
16
│ Transaction │  ← Ch.8: validation, pending/queue, blob pool
17
│    Pool     │
18
└──────┬──────┘
19
       │  NewTxsEvent
20
       ▼
21
┌─────────────┐
22
│  Handler    │  ← Ch.12: broadcast to peers
23
└──────┬──────┘       sqrt(N) direct + rest hash announcement
24
       │
25
       │  CL calls ForkchoiceUpdated (triggers block building)
26
       ▼
27
┌─────────────┐
28
│   Miner     │  ← Ch.9: fillTransactions, sort by tip
29
└──────┬──────┘
30
       │  ApplyTransaction for each tx
31
       ▼
32
┌─────────────┐
33
│   State     │  ← Ch.6: preCheck → buyGas → EVM → refund → fee distribution
34
│ Transition  │
35
└──────┬──────┘
36
       │  evm.Call() or evm.Create()
37
       ▼
38
┌─────────────┐
39
│    EVM      │  ← Ch.7: interpreter loop, opcode execution, gas metering
40
└──────┬──────┘
41
       │  SSTORE, CREATE etc. modify state
42
       ▼
43
┌─────────────┐
44
│  StateDB    │  ← Ch.4: stateObject dirty map, journal rollback
45
└──────┬──────┘
46
       │  FinalizeAndAssemble
47
       ▼
48
┌─────────────┐
49
│   Block     │  ← Ch.9: compute state root, assemble block
50
│  Assembly   │
51
└──────┬──────┘
52
       │  CL: getPayload → newPayload → forkchoiceUpdated
53
       ▼
54
┌─────────────┐
55
│ BlockChain  │  ← Ch.10: InsertChain, validate, writeBlockWithState
56
└──────┬──────┘
57
       │  ChainHeadEvent
58
       ▼
59
┌─────────────┐
60
│  Storage    │  ← Ch.3/4/5: trie commit → TrieDB → Pebble/LevelDB + Freezer
61
│   Stack     │
62
└──────┬──────┘
63
       │  broadcast to peers
64
       ▼
65
┌─────────────┐
66
│ P2P Network │  ← Ch.11/12: block propagation, lagging nodes sync
67
└──────┬──────┘
68
       │
69
       ▼
70
   Finalized     ← CL finalizes, never rolled back

Stage 1: RPC arrival (Chapter 13)#

The user calls eth_sendRawTransaction via HTTP/WebSocket/IPC, submitting a signed raw transaction.

1
HTTP POST → RPC Server decodes JSON
2
  → split method name on "_": service="eth", method="sendRawTransaction"
3
  → reflection lookup → TransactionAPI.SendRawTransaction
4
  → tx.UnmarshalBinary(rawBytes) → types.Transaction
5
  → SubmitTransaction() checks fee cap, EIP-155 protection
6
  → txPool.Add()

From this point, the transaction leaves the external world and enters geth internals.

Stage 2: Transaction pool validation and storage (Chapter 8)#

The TxPool coordinator routes to the appropriate sub-pool based on transaction type:

1
TxPool.Add()
2
  ├─ Normal transactions (type 0/1/2/4) → LegacyPool
3
  └─ Blob transactions (type 3)         → BlobPool
4

5
Sub-pool validation:
6
  ├─ Signature recovery (ECDSA recover)
7
  ├─ Nonce check (not too low, not too large a gap)
8
  ├─ Balance check (can cover gasLimit × gasFeeCap + value)
9
  ├─ Gas limit check (doesn't exceed block gas limit)
10
  └─ Pool-level limits (account slots, global slots, minimum price)
11

12
Passes validation → pending map (ready for block inclusion)

The pool emits NewTxsEvent via event.Feed.

Stage 3: P2P broadcast (Chapter 12)#

The handler’s txBroadcastLoop() receives the event and executes dual-layer broadcast:

1
BroadcastTransactions()
2
  ├─ ~sqrt(N) peers: send full transaction (TransactionsMsg)
3
  └─ remaining peers: send only hash (NewPooledTransactionHashesMsg)
4
        │
5
        ▼ remote peers receiving hash
6
      TxFetcher 3-stage pipeline
7
        ├─ wait 500ms (direct broadcast may arrive)
8
        ├─ didn't arrive → queue for request
9
        └─ send GetPooledTransactionsMsg to fetch
10

11
Within seconds → virtually every node in the network has this transaction

Stage 4: Block building trigger (Chapter 9)#

The transaction waits in the pool. When the consensus layer decides “you are this slot’s proposer”:

1
CL calls ForkchoiceUpdated (with payloadAttributes)
2
  → Engine API notifies miner
3
  → BuildPayload() starts:
4
      ├─ Immediately build empty block (guarantee, never miss a slot)
5
      └─ Background goroutine repeatedly builds full blocks
6
           ├─ 0s: first full build
7
           ├─ 2s: rebuild (new txs may have arrived)
8
           ├─ 4s: rebuild again
9
           └─ 6s: CL calls GetPayload, return best version

Stage 5: Transaction execution (Chapter 6)#

During each build, fillTransactions() pulls transactions from the pool, sorts by effective tip, and executes one by one:

1
For each transaction:
2
  core.ApplyTransaction()
3
    → stateTransition.execute()
4
      ├─ preCheck()     validate nonce, balance
5
      ├─ buyGas()       pre-deduct gasLimit × gasFeeCap
6
      ├─ EVM dispatch   evm.Call() or evm.Create()
7
      ├─ calcRefund()   refund cap = gasUsed / 5 (EIP-3529)
8
      ├─ returnGas()    return remaining gas to sender
9
      └─ fee distribution  tip → coinbase, baseFee → burned

If execution fails, state rolls back to the pre-transaction snapshot, gas pool is restored — failed transactions leave no trace.

Stage 6: EVM execution (Chapter 7)#

Inside evm.Call(), the interpreter loop runs the contract bytecode:

1
for {
2
    ① op = contract.GetOp(pc)          // fetch opcode
3
    ② operation = jumpTable[op]        // lookup gas and handler
4
    ③ validate stack depth
5
    ④ deduct constantGas
6
    ⑤ compute and deduct dynamicGas (e.g., SLOAD cold/warm)
7
    ⑥ expand memory and charge
8
    ⑦ operation.execute()              // execute!
9
    pc++
10
}

State-modifying opcodes (SSTORE, CREATE, etc.) write to StateDB’s dirty map, protected by the journal for rollback.

Stage 7: State commit (Chapters 3/4)#

After all transactions are executed:

1
FinalizeAndAssemble()
2
  ├─ Process withdrawals (validator balances)
3
  ├─ System-level operations (beacon root, etc.)
4
  ├─ statedb.Commit()
5
  │    ├─ Each modified account → storage trie updated and hashed
6
  │    ├─ Account trie updated and hashed
7
  │    └─ Produces 32-byte Merkle state root
8
  └─ Assemble block (header + txs + receipts + withdrawals)

This block is the payload returned to the CL via GetPayload.

Stage 8: Block insertion (Chapter 10)#

After the CL validates the block on the beacon chain, it sends it back to geth via NewPayload:

1
InsertBlockWithoutSetHead()
2
  ├─ Header validation (timestamp, gas limit, consensus constraints)
3
  ├─ Body validation (tx root, uncles hash, withdrawals hash)
4
  ├─ State processing — re-execute all transactions (identical to stages 5-6!)
5
  │    Compare resulting state root with header's StateRoot
6
  │    Mismatch → reject block (INVALID)
7
  └─ writeBlockWithState() persist to disk
8

9
Then CL calls ForkchoiceUpdated to designate new head:
10
  → writeHeadBlock() updates canonical chain pointers
11
  → emit ChainHeadEvent

Stage 9: Persistence and propagation (Chapters 5/11/12)#

Data flows down the storage stack:

1
StateDB → Trie → TrieDB → ethdb (Pebble/LevelDB)
2
                              │
3
                              └─ Old blocks → Freezer (append-only ancient storage)

Meanwhile, the handler broadcasts the new block to peers. Lagging nodes catch up via snap sync or full sync.

Stage 10: Finalization#

The CL eventually marks the block (or its ancestor) as finalized:

1
Geth updates currentFinalBlock pointer
2
  → This transaction can never be rolled back
3
  → Permanently part of the canonical chain

From user pressing send to finalization, a transaction passes through RPC → pool → P2P → miner → state transition → EVM → StateDB → trie → blockchain → storage → network → finality — spanning every major subsystem in geth.

Q2: What are the cross-cutting design patterns in the geth codebase?#

Recognizing these patterns gives you a “I’ve seen this before” feeling when reading any subsystem.

Pattern 1: Lifecycle interface (start/stop contract)#

1
type Lifecycle interface {
2
    Start() error
3
    Stop() error
4
}

Almost all long-running components follow this contract:

1
Component               Start()                        Stop()
2
─────────────────────────────────────────────────────────────────
3
Ethereum                setupDiscovery, handler.Start   handler.Stop, txPool.Close, blockchain.Stop
4
handler                 txBroadcastLoop, txFetcher      stop sync and broadcast
5
P2P Server              TCP listen, discovery, dialer   disconnect all peers

Key rules:

Registration must precede Start — registering at runtime panics
Stop in reverse order — consumers first, producers next, storage last
Naming may vary (Init/Close, New/Stop), but the pattern is the same

Pattern 2: Event Feed (publish/subscribe decoupling)#

1
// Producer
2
type BlockChain struct {
3
    chainHeadFeed event.Feed
4
}
5
func (bc *BlockChain) insertChain(...) {
6
    bc.chainHeadFeed.Send(ChainHeadEvent{Block: block})
7
}
8

9
// Consumer
10
headCh := make(chan core.ChainHeadEvent, 64)
11
sub := bc.SubscribeChainHeadEvent(headCh)
12
for {
13
    select {
14
    case head := <-headCh:
15
        // react to new chain head
16
    case err := <-sub.Err():
17
        return
18
    }
19
}

Key feeds in geth:

1
Event              Producer        Consumer                    Purpose
2
──────────────────────────────────────────────────────────────────────
3
ChainHeadEvent     BlockChain      miner, handler,             new block arrived
4
                                   filter system, txpool
5
NewTxsEvent        TxPool          handler                     new tx, needs broadcast
6
WalletEvent        account backend startNode wallet listener   wallet plugged/unplugged
7
ChainSideEvent     BlockChain      filter system               side chain block (uncles)

Why Feed instead of direct calls? Decoupling. BlockChain doesn’t need to import the miner package, miner doesn’t need to import the handler package. They communicate through events, unaware of each other’s existence. Adding a new consumer requires no changes to producer code.

Pattern 3: Backend interface (API vs. implementation separation)#

1
JSON-RPC methods (ethapi/api.go)
2
       │
3
       │ calls
4
       ▼
5
  Backend interface (ethapi/backend.go)
6
       │
7
       │ implemented by
8
       ├─ EthAPIBackend (full node)
9
       └─ LESAPIBackend (light client)
10
              │
11
              │ delegates to
12
              ▼
13
        BlockChain, TxPool, Miner, StateDB...

eth_getBalance code doesn’t care whether it’s running on a full node or light client — it just calls backend.StateAndHeaderByNumberOrHash(). The concrete implementation decides where data comes from (local database vs. remote request).

This pattern appears in many places:

Backend interface → bridge between RPC API and core
consensus.Engine interface → bridge between consensus logic and blockchain
ethdb.KeyValueStore interface → bridge between storage logic and concrete KV engine
txpool.SubPool interface → bridge between pool coordinator and pool implementations

Pattern 4: Config struct (one configuration aggregate per subsystem)#

1
CLI flags
2
   │
3
   │ utils.SetNodeConfig()
4
   ▼
5
node.Config          → data dir, P2P settings, RPC endpoints
6
   │
7
   │ utils.SetEthConfig()
8
   ▼
9
ethconfig.Config     → sync mode, cache size, gas price
10
   │
11
   ├─ p2p.Config     → max peers, listen addr, NAT, bootnodes
12
   ├─ ChainConfig    → fork activation times, chain ID, consensus rules
13
   └─ TxPool.Config  → price limits, slot limits, journal settings

Each Config has hardcoded defaults, overridable by TOML file, overridable again by CLI flags. Three layers stacked, finally passed to the corresponding subsystem’s constructor.

Pattern 5: Four-layer storage model (read-through + write-flush)#

1
Layer 4: StateDB      in-memory dirty maps + journal rollback
2
            │ read-through ↓     ↑ writes accumulate
3
Layer 3: Trie          Merkle Patricia Trie nodes
4
            │ read-through ↓     ↑ commit flushes
5
Layer 2: TrieDB        caching layer (path-based or hash-based)
6
            │ read-through ↓     ↑ flush to disk
7
Layer 1: ethdb         KV store (Pebble/LevelDB) + Freezer

Read direction: StateDB checks dirty cache first → miss penetrates to trie → trie penetrates to TrieDB → TrieDB penetrates to disk.

Write direction: Modifications accumulate in StateDB’s dirty map → commit flushes to trie → trie commits to TrieDB → TrieDB eventually flushes to disk.

This “accumulate at upper layers, periodically flush to lower layers” pattern makes per-transaction state modifications extremely cheap (memory only), paying the trie hashing and disk write cost only at block commit time.

Why recognizing these patterns matters#

When you encounter an unfamiliar subsystem:

1
1. Does it have Start/Stop?                → Lifecycle pattern, find where it's registered
2
2. Does it Send or Subscribe to something? → Event Feed pattern, find producers and consumers
3
3. Does it call an interface or concrete?   → Backend pattern, find the implementation
4
4. Does its constructor accept a Config?    → Config pattern, find defaults and CLI mapping
5
5. How many layers does its data cross?     → Four-layer storage, find penetration and flush paths

These 5 questions can help you locate any subsystem’s position in the geth architecture within minutes.

Welcome