Geth(11) QA - Kehao Zheng's Website

Q1: What is the complete lifecycle of a peer connection, from discovery to disconnect?#

Network stack overview#

Geth’s P2P network consists of two independent channels:

1
UDP channel (node discovery)           TCP channel (data transport)
2
┌─────────────────────────┐          ┌──────────────────────────────┐
3
│  discv4 / discv5        │          │  Sub-protocols (eth/68, snap)│
4
│  Kademlia DHT           │          │  devp2p base protocol        │
5
│  PING/PONG/FINDNODE     │          │  RLPx encrypted transport    │
6
└─────────────────────────┘          └──────────────────────────────┘

UDP only handles “discover who’s online.” TCP handles “actual communication.” The two are completely independent — discovering a node doesn’t mean connecting to it, and connecting to a node doesn’t require discovering it (e.g., static nodes).

Complete lifecycle: 6 stages#

1
Stage 1: Discovery (UDP)
2
  Routing table populated via PING/PONG/FINDNODE with candidate nodes
3
       │
4
       ▼
5
Stage 2: Dial (TCP)
6
  Dial scheduler picks candidates from FairMix, establishes TCP connection
7
       │
8
       ▼
9
Stage 3: RLPx encryption handshake
10
  ECIES key exchange → derive AES + MAC keys
11
  Checkpoint: main loop checks MaxPeers, duplicates, self-connection
12
       │
13
       ▼
14
Stage 4: devp2p protocol handshake
15
  Exchange capabilities (eth/68, snap/1, etc.)
16
  Checkpoint: main loop checks at least one matching sub-protocol
17
       │
18
       ▼
19
Stage 5: Message loop
20
  readLoop decrypts → routes (base protocol handled directly, sub-protocols dispatched to protoRW)
21
  pingLoop heartbeat every 15 seconds
22
  One goroutine per matched sub-protocol handles messages
23
       │
24
       ▼
25
Stage 6: Disconnect
26
  Network error / remote disconnect / protocol error / local shutdown
27
  → Peer.run() exits → notify server main loop → dial scheduler fills vacancy

Stages 1-2: Discovery and dialing#

Discovery fills the routing table via UDP (see Q3). Multiple discovery sources (discv4, discv5, DNS, static nodes) are merged through FairMix into a single iterator.

The dial scheduler consumes this iterator and decides which nodes to dial based on available slots:

1
MaxPeers = 50, DialRatio = 3
2

3
Outbound slots = 50 / 3 = 16 (actively dialed)
4
Inbound slots  = 50 - 16 = 34 (waiting for others to connect)

Static nodes are always dialed and automatically reconnected on disconnect. Dynamic nodes fill remaining slots.

Stage 3: RLPx encryption handshake#

After TCP connection is established, SetupConn() starts the two-phase handshake. Phase 1 is RLPx encryption:

1
Initiator (dialer)                    Responder (listener)
2
    │                                      │
3
    │  auth message (ECIES encrypted)      │
4
    │  { signature, initiator pubkey,      │
5
    │    nonce }                            │
6
    │ ───────────────────────────────────→  │
7
    │                                      │  Decrypt, verify signature
8
    │  auth-ack message (ECIES encrypted)  │
9
    │  { responder ephemeral pubkey,       │
10
    │    nonce }                            │
11
    │ ←───────────────────────────────────  │
12
    │                                      │
13
    │  Both sides derive shared secrets:   │
14
    │  ecdheSecret → sharedSecret          │
15
    │  → aesSecret → MAC key               │

After handshake, the connection upgrades to an encrypted channel: AES-256-CTR encryption + Keccak-256 MAC authentication. All subsequent messages are sent as encrypted frames.

The connection then passes through the first checkpoint — sent via channel to the server’s main loop run(), where single-threaded checks occur:

1
// Check order:
2
1. Not trusted and peer count full?           → DiscTooManyPeers
3
2. Not trusted and inbound count full?        → DiscTooManyPeers
4
3. Already connected to same node?            → DiscAlreadyConnected
5
4. Connected to self?                         → DiscSelf

Why route checks through the main loop instead of checking in place? Because peer counting must be serialized in a single thread — multiple concurrent handshakes each reading the peer map would create race conditions.

Stage 4: devp2p protocol handshake#

With the encrypted channel established, both sides exchange protocol handshake messages:

1
type protoHandshake struct {
2
    Version    uint64
3
    Name       string    // "Geth/v1.16.7-stable/linux-amd64/go1.23.0"
4
    Caps       []Cap     // [{eth 68}, {snap 1}]
5
    ListenPort uint64
6
    ID         []byte    // secp256k1 public key
7
}

Once both sides know each other’s supported protocols, the connection passes through the second checkpoint:

1
func addPeerChecks(...) error {
2
    // At least one matching sub-protocol?
3
    if countMatchingProtocols(srv.Protocols, c.caps) == 0 {
4
        return DiscUselessPeer  // No common language, disconnect
5
    }
6
    // Re-check peer counts (may have changed between checkpoints)
7
    return srv.postHandshakeChecks(peers, inboundCount, c)
8
}

Stage 5: Message loop#

After passing both checkpoints, launchPeer() creates a Peer and starts multiple goroutines:

1
Peer.run()
2
  ├─ readLoop()    — reads from encrypted connection, routes by message code
3
  ├─ pingLoop()    — sends PING every 15 seconds, replies PONG to received PINGs
4
  ├─ eth/68 Run()  — Ethereum protocol handler goroutine
5
  └─ snap/1 Run()  — snap sync protocol handler goroutine

readLoop’s routing logic after reading a message:

1
Message codes 0-15  → base protocol (ping/pong/disconnect) handled directly
2
Message codes 16-32 → eth/68's protoRW.in channel
3
Message codes 33-40 → snap/1's protoRW.in channel

Stage 6: Disconnect#

Any goroutine error triggers disconnect:

1
readLoop network error ──┐
2
pingLoop timeout       ──┤
3
Protocol handler error ──┼──→ Peer.run() exits
4
Remote sends discMsg   ──┤       │
5
Local calls Disconnect ──┘       ▼
6
                           close(p.closed)  → signal all goroutines to exit
7
                           p.rw.close()     → close TCP connection
8
                           p.wg.Wait()      → wait for all goroutines
9
                           delpeer channel  → notify server main loop
10
                                                    │
11
                                                    ▼
12
                                             Main loop: delete(peers, id)
13
                                             Dial scheduler: peerRemoved → fill vacancy

The problem#

Geth runs multiple protocols simultaneously — eth/68 (blocks and transactions), snap/1 (fast sync), etc. But each peer has only one TCP connection. How do multiple protocols’ messages coexist on the same connection without interference?

Message code segmentation#

The core idea: give each protocol a non-overlapping range of message codes.

The devp2p base protocol reserves codes 0–15. Sub-protocols start from 16, allocated in match order:

1
Code range       Protocol
2
0  - 15          devp2p base protocol (handshake, ping, pong, disconnect)
3
16 - 32          eth/68 (17 message codes)
4
33 - 40          snap/1 (8 message codes)

Allocation happens in matchProtocols():

1
func matchProtocols(protocols []Protocol, caps []Cap, rw MsgReadWriter) map[string]*protoRW {
2
    offset := baseProtocolLength  // start at 16
3
    result := make(map[string]*protoRW)
4

5
    for _, cap := range caps {
6
        for _, proto := range protocols {
7
            if proto.Name == cap.Name && proto.Version == cap.Version {
8
                result[cap.Name] = &protoRW{
9
                    Protocol: proto,
10
                    offset:   offset,   // this protocol's base code
11
                    in:       make(chan Msg),
12
                    w:        rw,
13
                }
14
                offset += proto.Length  // next protocol starts here
15
            }
16
        }
17
    }
18
    return result
19
}

If both sides support multiple versions of the same protocol (e.g., eth/67 and eth/68), only the highest version is kept — the older version’s code range is reclaimed.

protoRW: transparent translation#

Each sub-protocol handler reads and writes through protoRW, which automatically translates between “protocol-local codes” and “wire codes”:

1
eth/68 handler's view        protoRW translation        actual wire code
2
  code 0 (StatusMsg)     →   0 + 16 = 16          →    sends code 16
3
  code 1 (NewBlockHashes) →  1 + 16 = 17          →    sends code 17
4

5
snap/1 handler's view
6
  code 0 (GetAccountRange) → 0 + 33 = 33          →    sends code 33

Add offset on write, subtract on read:

1
// Write: protocol-local code → wire code
2
func (rw *protoRW) WriteMsg(msg Msg) error {
3
    msg.Code += rw.offset    // add offset
4
    // ...wait for write token, then write to connection
5
}
6

7
// Read: wire code → protocol-local code
8
func (rw *protoRW) ReadMsg() (Msg, error) {
9
    msg := <-rw.in
10
    msg.Code -= rw.offset    // subtract offset
11
    return msg, nil
12
}

Protocol handlers have no idea what their wire codes are. eth/68’s StatusMsg is always code 0, regardless of whether it’s 16 on the wire or some other value. This completely decouples protocol implementation from protocol composition.

Read direction: readLoop routing#

1
TCP connection → readLoop reads → message code 17
2
                                       │
3
                             getProto(17):
4
                             eth's range is [16, 33), 17 is in range
5
                                       │
6
                                       ▼
7
                             eth's protoRW.in <- msg
8
                                       │
9
                                       ▼
10
                             eth handler ReadMsg():
11
                             msg.Code = 17 - 16 = 1 (NewBlockHashesMsg)

Write direction: serialization token#

Multiple protocol goroutines run concurrently, but there’s physically only one TCP connection. If two goroutines write simultaneously, messages would interleave and corrupt on the wire.

The solution is a capacity-1 channel as a write token:

1
writeStart channel (capacity 1, initially holds one token)
2

3
eth goroutine wants to write:
4
  1. <-writeStart    // take the token (block if unavailable)
5
  2. rw.WriteMsg()   // write the message
6
  3. werr <- err     // report result
7
  4. Peer.run() receives result, writeStart <- struct{}{} // return token
8

9
snap goroutine wants to write:
10
  1. <-writeStart    // wait for eth to finish and return the token
11
  2. ...

Only one goroutine can hold the token at a time, ensuring messages are written intact without using a mutex.

Concrete example#

Suppose eth/68 and snap/1 are running simultaneously:

1
eth/68 goroutine:                    snap/1 goroutine:
2
  send NewBlockHashes (code 1)         send GetAccountRange (code 0)
3
  → protoRW: 1 + 16 = 17              → protoRW: 0 + 33 = 33
4
  → wait for write token...            → wait for write token...
5
  → got token, write code 17           → waiting...
6
  → done, return token                 → got token, write code 33
7
                                       → done, return token
8

9
Remote readLoop:
10
  reads code 17 → getProto(17) → eth → eth.in <- msg → eth sees code 1
11
  reads code 33 → getProto(33) → snap → snap.in <- msg → snap sees code 0

Q3: How does Kademlia DHT node discovery work?#

The problem#

Ethereum has no central server. When a new node starts, how does it find other nodes on the network? The answer is Kademlia Distributed Hash Table (DHT) — a fully decentralized node discovery mechanism.

Core concept: XOR distance#

Kademlia defines “distance” between two nodes using XOR:

1
Node A's ID: 0x1010... (Keccak256 of public key)
2
Node B's ID: 0x1001...
3

4
Distance = XOR(A, B) = 0x0011...

This is not physical distance — two nodes in New York and Tokyo can be very “close” in XOR space. XOR distance has two important properties:

Symmetry: XOR(A, B) = XOR(B, A) — distance from A to B equals distance from B to A
Triangle inequality: XOR(A, C) ≤ XOR(A, B) + XOR(B, C)

Distance level is expressed as log2(XOR). The more leading bits two IDs share, the closer they are.

Routing table: bucket organization#

The routing table is divided into buckets by distance level:

1
Bucket 0:  Closest nodes (most shared prefix bits)
2
Bucket 1:  Slightly farther nodes
3
...
4
Bucket 16: Farthest nodes (almost completely different prefix)
5

6
Each bucket:
7
  entries[16]       — up to 16 active nodes, most recently used first
8
  replacements[10]  — up to 10 replacement candidates

Key design: know more detail about nearby regions, only need a few representatives for distant regions. Like real life — you know every building in your neighborhood, but only a few landmarks in another city.

Defense: IP limits#

To prevent Sybil attacks (attacker generating massive fake nodes), the routing table has IP limits:

1
Per bucket: max 2 nodes from the same /24 subnet
2
Whole table: max 10 nodes from the same /24 subnet

Even if an attacker controls an entire /24 subnet (256 IPs), they can only occupy limited positions in the routing table.

Four message types#

Discovery v4 has 4 core messages (plus 2 EIP-868 extensions):

1
PING      → "Are you alive?"
2
PONG      ← "I am, and your external IP is X" (helps NAT discovery)
3

4
FINDNODE  → "Give me nodes closest to this target"
5
NEIGHBORS ← "Here are the closest 12 nodes"

Each packet carries an ECDSA signature. The receiver can recover the sender’s public key (i.e., node ID) from the signature, without prior key exchange.

Lookup algorithm: finding specific nodes#

Lookup is Kademlia’s core operation — given a target ID, find the nodes closest to it in the network:

1
Goal: find nodes closest to target T
2

3
Round 1:
4
  Pick the 3 nodes (alpha=3) closest to T from own routing table
5
  Send FINDNODE(T) to them concurrently
6

7
  Node A replies: [X, Y, Z] (its known nodes closest to T)
8
  Node B replies: [Y, W, V]
9
  Node C replies: [W, U, S]
10

11
Round 2:
12
  Merge all results, sort by XOR distance to T
13
  Pick the 3 closest unqueried nodes
14
  Send FINDNODE(T) concurrently
15

16
  These nodes are closer to T, so their neighbors are even closer to T
17
  → replies contain closer nodes
18

19
Round 3:
20
  Continue...
21

22
Convergence:
23
  When a round discovers no nodes closer than the best known, algorithm terminates

Why does it converge? Each round queries nodes closer to the target than the previous round. In XOR space, nodes closer to the target know more detail about the target’s neighborhood (due to the routing table’s bucket design). So each round discovers closer nodes, until no closer ones exist.

Bootstrap process#

When a new node starts for the first time, the routing table is empty:

1
1. Start from hardcoded bootnodes
2
   MainnetBootnodes = [
3
     "enode://d860a01f...@18.138.108.67:30303",
4
     "enode://22a8232c...@3.209.45.79:30303",
5
     ...
6
   ]
7

8
2. PING bootnode → receive PONG → bootnode enters routing table
9

10
3. Execute lookup(own ID)
11
   → send FINDNODE(myID) to bootnode
12
   → bootnode replies with nodes closest to myID
13
   → these nodes enter routing table
14
   → continue lookup, routing table fills rapidly
15

16
4. Node database (enode.DB) persists to disk
17
   → next restart doesn't need to bootstrap from scratch

Performing a lookup on your own ID is a clever design — it fills buckets across all distance ranges, because the query path passes through nodes at various distances.

Discovery v5 improvements#

v5 adds to v4:

	v4	v5
Authentication	ECDSA signature per packet (65 bytes)	Session keys after initial WHOAREYOU challenge
Node info	Basic IP + port	Full ENR records (extensible key-value pairs)
Targeted discovery	Not supported	Topic advertisement (e.g., “find light client servers”)

Both versions can run simultaneously. FairMix merges their results into a single candidate node stream for the dial scheduler.

Welcome

Q1: What is the complete lifecycle of a peer connection, from discovery to disconnect?#

Network stack overview#

Complete lifecycle: 6 stages#

Stages 1-2: Discovery and dialing#

Stage 3: RLPx encryption handshake#

Stage 4: devp2p protocol handshake#

Stage 5: Message loop#

Stage 6: Disconnect#

Q2: How do multiple sub-protocols share a single TCP connection? (Message multiplexing)#

The problem#

Message code segmentation#

protoRW: transparent translation#

Read direction: readLoop routing#

Write direction: serialization token#

Concrete example#

Q3: How does Kademlia DHT node discovery work?#

The problem#

Core concept: XOR distance#

Routing table: bucket organization#

Defense: IP limits#

Four message types#

Lookup algorithm: finding specific nodes#

Bootstrap process#

Discovery v5 improvements#