Mobile wallpaper 1Mobile wallpaper 2Mobile wallpaper 3
2049 words
10 minutes
Geth(11) QA

Q1: What is the complete lifecycle of a peer connection, from discovery to disconnect?#

Network stack overview#

Geth’s P2P network consists of two independent channels:

UDP channel (node discovery) TCP channel (data transport)
┌─────────────────────────┐ ┌──────────────────────────────┐
│ discv4 / discv5 │ │ Sub-protocols (eth/68, snap)│
│ Kademlia DHT │ │ devp2p base protocol │
│ PING/PONG/FINDNODE │ │ RLPx encrypted transport │
└─────────────────────────┘ └──────────────────────────────┘

UDP only handles “discover who’s online.” TCP handles “actual communication.” The two are completely independent — discovering a node doesn’t mean connecting to it, and connecting to a node doesn’t require discovering it (e.g., static nodes).

Complete lifecycle: 6 stages#

Stage 1: Discovery (UDP)
Routing table populated via PING/PONG/FINDNODE with candidate nodes
Stage 2: Dial (TCP)
Dial scheduler picks candidates from FairMix, establishes TCP connection
Stage 3: RLPx encryption handshake
ECIES key exchange → derive AES + MAC keys
Checkpoint: main loop checks MaxPeers, duplicates, self-connection
Stage 4: devp2p protocol handshake
Exchange capabilities (eth/68, snap/1, etc.)
Checkpoint: main loop checks at least one matching sub-protocol
Stage 5: Message loop
readLoop decrypts → routes (base protocol handled directly, sub-protocols dispatched to protoRW)
pingLoop heartbeat every 15 seconds
One goroutine per matched sub-protocol handles messages
Stage 6: Disconnect
Network error / remote disconnect / protocol error / local shutdown
→ Peer.run() exits → notify server main loop → dial scheduler fills vacancy

Stages 1-2: Discovery and dialing#

Discovery fills the routing table via UDP (see Q3). Multiple discovery sources (discv4, discv5, DNS, static nodes) are merged through FairMix into a single iterator.

The dial scheduler consumes this iterator and decides which nodes to dial based on available slots:

MaxPeers = 50, DialRatio = 3
Outbound slots = 50 / 3 = 16 (actively dialed)
Inbound slots = 50 - 16 = 34 (waiting for others to connect)

Static nodes are always dialed and automatically reconnected on disconnect. Dynamic nodes fill remaining slots.

Stage 3: RLPx encryption handshake#

After TCP connection is established, SetupConn() starts the two-phase handshake. Phase 1 is RLPx encryption:

Initiator (dialer) Responder (listener)
│ │
│ auth message (ECIES encrypted) │
│ { signature, initiator pubkey, │
│ nonce } │
│ ───────────────────────────────────→ │
│ │ Decrypt, verify signature
│ auth-ack message (ECIES encrypted) │
│ { responder ephemeral pubkey, │
│ nonce } │
│ ←─────────────────────────────────── │
│ │
│ Both sides derive shared secrets: │
│ ecdheSecret → sharedSecret │
│ → aesSecret → MAC key │

After handshake, the connection upgrades to an encrypted channel: AES-256-CTR encryption + Keccak-256 MAC authentication. All subsequent messages are sent as encrypted frames.

The connection then passes through the first checkpoint — sent via channel to the server’s main loop run(), where single-threaded checks occur:

// Check order:
1. Not trusted and peer count full? → DiscTooManyPeers
2. Not trusted and inbound count full? → DiscTooManyPeers
3. Already connected to same node? → DiscAlreadyConnected
4. Connected to self? → DiscSelf

Why route checks through the main loop instead of checking in place? Because peer counting must be serialized in a single thread — multiple concurrent handshakes each reading the peer map would create race conditions.

Stage 4: devp2p protocol handshake#

With the encrypted channel established, both sides exchange protocol handshake messages:

type protoHandshake struct {
Version uint64
Name string // "Geth/v1.16.7-stable/linux-amd64/go1.23.0"
Caps []Cap // [{eth 68}, {snap 1}]
ListenPort uint64
ID []byte // secp256k1 public key
}

Once both sides know each other’s supported protocols, the connection passes through the second checkpoint:

func addPeerChecks(...) error {
// At least one matching sub-protocol?
if countMatchingProtocols(srv.Protocols, c.caps) == 0 {
return DiscUselessPeer // No common language, disconnect
}
// Re-check peer counts (may have changed between checkpoints)
return srv.postHandshakeChecks(peers, inboundCount, c)
}

Stage 5: Message loop#

After passing both checkpoints, launchPeer() creates a Peer and starts multiple goroutines:

Peer.run()
├─ readLoop() — reads from encrypted connection, routes by message code
├─ pingLoop() — sends PING every 15 seconds, replies PONG to received PINGs
├─ eth/68 Run() — Ethereum protocol handler goroutine
└─ snap/1 Run() — snap sync protocol handler goroutine

readLoop’s routing logic after reading a message:

Message codes 0-15 → base protocol (ping/pong/disconnect) handled directly
Message codes 16-32 → eth/68's protoRW.in channel
Message codes 33-40 → snap/1's protoRW.in channel

Stage 6: Disconnect#

Any goroutine error triggers disconnect:

readLoop network error ──┐
pingLoop timeout ──┤
Protocol handler error ──┼──→ Peer.run() exits
Remote sends discMsg ──┤ │
Local calls Disconnect ──┘ ▼
close(p.closed) → signal all goroutines to exit
p.rw.close() → close TCP connection
p.wg.Wait() → wait for all goroutines
delpeer channel → notify server main loop
Main loop: delete(peers, id)
Dial scheduler: peerRemoved → fill vacancy

Q2: How do multiple sub-protocols share a single TCP connection? (Message multiplexing)#

The problem#

Geth runs multiple protocols simultaneously — eth/68 (blocks and transactions), snap/1 (fast sync), etc. But each peer has only one TCP connection. How do multiple protocols’ messages coexist on the same connection without interference?

Message code segmentation#

The core idea: give each protocol a non-overlapping range of message codes.

The devp2p base protocol reserves codes 0–15. Sub-protocols start from 16, allocated in match order:

Code range Protocol
0 - 15 devp2p base protocol (handshake, ping, pong, disconnect)
16 - 32 eth/68 (17 message codes)
33 - 40 snap/1 (8 message codes)

Allocation happens in matchProtocols():

func matchProtocols(protocols []Protocol, caps []Cap, rw MsgReadWriter) map[string]*protoRW {
offset := baseProtocolLength // start at 16
result := make(map[string]*protoRW)
for _, cap := range caps {
for _, proto := range protocols {
if proto.Name == cap.Name && proto.Version == cap.Version {
result[cap.Name] = &protoRW{
Protocol: proto,
offset: offset, // this protocol's base code
in: make(chan Msg),
w: rw,
}
offset += proto.Length // next protocol starts here
}
}
}
return result
}

If both sides support multiple versions of the same protocol (e.g., eth/67 and eth/68), only the highest version is kept — the older version’s code range is reclaimed.

protoRW: transparent translation#

Each sub-protocol handler reads and writes through protoRW, which automatically translates between “protocol-local codes” and “wire codes”:

eth/68 handler's view protoRW translation actual wire code
code 0 (StatusMsg) → 0 + 16 = 16 → sends code 16
code 1 (NewBlockHashes) → 1 + 16 = 17 → sends code 17
snap/1 handler's view
code 0 (GetAccountRange) → 0 + 33 = 33 → sends code 33

Add offset on write, subtract on read:

// Write: protocol-local code → wire code
func (rw *protoRW) WriteMsg(msg Msg) error {
msg.Code += rw.offset // add offset
// ...wait for write token, then write to connection
}
// Read: wire code → protocol-local code
func (rw *protoRW) ReadMsg() (Msg, error) {
msg := <-rw.in
msg.Code -= rw.offset // subtract offset
return msg, nil
}

Protocol handlers have no idea what their wire codes are. eth/68’s StatusMsg is always code 0, regardless of whether it’s 16 on the wire or some other value. This completely decouples protocol implementation from protocol composition.

Read direction: readLoop routing#

TCP connection → readLoop reads → message code 17
getProto(17):
eth's range is [16, 33), 17 is in range
eth's protoRW.in <- msg
eth handler ReadMsg():
msg.Code = 17 - 16 = 1 (NewBlockHashesMsg)

Write direction: serialization token#

Multiple protocol goroutines run concurrently, but there’s physically only one TCP connection. If two goroutines write simultaneously, messages would interleave and corrupt on the wire.

The solution is a capacity-1 channel as a write token:

writeStart channel (capacity 1, initially holds one token)
eth goroutine wants to write:
1. <-writeStart // take the token (block if unavailable)
2. rw.WriteMsg() // write the message
3. werr <- err // report result
4. Peer.run() receives result, writeStart <- struct{}{} // return token
snap goroutine wants to write:
1. <-writeStart // wait for eth to finish and return the token
2. ...

Only one goroutine can hold the token at a time, ensuring messages are written intact without using a mutex.

Concrete example#

Suppose eth/68 and snap/1 are running simultaneously:

eth/68 goroutine: snap/1 goroutine:
send NewBlockHashes (code 1) send GetAccountRange (code 0)
→ protoRW: 1 + 16 = 17 → protoRW: 0 + 33 = 33
→ wait for write token... → wait for write token...
→ got token, write code 17 → waiting...
→ done, return token → got token, write code 33
→ done, return token
Remote readLoop:
reads code 17 → getProto(17) → eth → eth.in <- msg → eth sees code 1
reads code 33 → getProto(33) → snap → snap.in <- msg → snap sees code 0

Q3: How does Kademlia DHT node discovery work?#

The problem#

Ethereum has no central server. When a new node starts, how does it find other nodes on the network? The answer is Kademlia Distributed Hash Table (DHT) — a fully decentralized node discovery mechanism.

Core concept: XOR distance#

Kademlia defines “distance” between two nodes using XOR:

Node A's ID: 0x1010... (Keccak256 of public key)
Node B's ID: 0x1001...
Distance = XOR(A, B) = 0x0011...

This is not physical distance — two nodes in New York and Tokyo can be very “close” in XOR space. XOR distance has two important properties:

  • Symmetry: XOR(A, B) = XOR(B, A) — distance from A to B equals distance from B to A
  • Triangle inequality: XOR(A, C) ≤ XOR(A, B) + XOR(B, C)

Distance level is expressed as log2(XOR). The more leading bits two IDs share, the closer they are.

Routing table: bucket organization#

The routing table is divided into buckets by distance level:

Bucket 0: Closest nodes (most shared prefix bits)
Bucket 1: Slightly farther nodes
...
Bucket 16: Farthest nodes (almost completely different prefix)
Each bucket:
entries[16] — up to 16 active nodes, most recently used first
replacements[10] — up to 10 replacement candidates

Key design: know more detail about nearby regions, only need a few representatives for distant regions. Like real life — you know every building in your neighborhood, but only a few landmarks in another city.

Defense: IP limits#

To prevent Sybil attacks (attacker generating massive fake nodes), the routing table has IP limits:

Per bucket: max 2 nodes from the same /24 subnet
Whole table: max 10 nodes from the same /24 subnet

Even if an attacker controls an entire /24 subnet (256 IPs), they can only occupy limited positions in the routing table.

Four message types#

Discovery v4 has 4 core messages (plus 2 EIP-868 extensions):

PING → "Are you alive?"
PONG ← "I am, and your external IP is X" (helps NAT discovery)
FINDNODE → "Give me nodes closest to this target"
NEIGHBORS ← "Here are the closest 12 nodes"

Each packet carries an ECDSA signature. The receiver can recover the sender’s public key (i.e., node ID) from the signature, without prior key exchange.

Lookup algorithm: finding specific nodes#

Lookup is Kademlia’s core operation — given a target ID, find the nodes closest to it in the network:

Goal: find nodes closest to target T
Round 1:
Pick the 3 nodes (alpha=3) closest to T from own routing table
Send FINDNODE(T) to them concurrently
Node A replies: [X, Y, Z] (its known nodes closest to T)
Node B replies: [Y, W, V]
Node C replies: [W, U, S]
Round 2:
Merge all results, sort by XOR distance to T
Pick the 3 closest unqueried nodes
Send FINDNODE(T) concurrently
These nodes are closer to T, so their neighbors are even closer to T
→ replies contain closer nodes
Round 3:
Continue...
Convergence:
When a round discovers no nodes closer than the best known, algorithm terminates

Why does it converge? Each round queries nodes closer to the target than the previous round. In XOR space, nodes closer to the target know more detail about the target’s neighborhood (due to the routing table’s bucket design). So each round discovers closer nodes, until no closer ones exist.

Bootstrap process#

When a new node starts for the first time, the routing table is empty:

1. Start from hardcoded bootnodes
MainnetBootnodes = [
"enode://d860a01f...@18.138.108.67:30303",
"enode://22a8232c...@3.209.45.79:30303",
...
]
2. PING bootnode → receive PONG → bootnode enters routing table
3. Execute lookup(own ID)
→ send FINDNODE(myID) to bootnode
→ bootnode replies with nodes closest to myID
→ these nodes enter routing table
→ continue lookup, routing table fills rapidly
4. Node database (enode.DB) persists to disk
→ next restart doesn't need to bootstrap from scratch

Performing a lookup on your own ID is a clever design — it fills buckets across all distance ranges, because the query path passes through nodes at various distances.

Discovery v5 improvements#

v5 adds to v4:

v4v5
AuthenticationECDSA signature per packet (65 bytes)Session keys after initial WHOAREYOU challenge
Node infoBasic IP + portFull ENR records (extensible key-value pairs)
Targeted discoveryNot supportedTopic advertisement (e.g., “find light client servers”)

Both versions can run simultaneously. FairMix merges their results into a single candidate node stream for the dial scheduler.

Geth(11) QA
https://kehaozheng.vercel.app/posts/chainethgeth/11_qa/
Author
Kehao Zheng
Published at
2026-04-20
License
CC BY-NC-SA 4.0

Some information may be outdated