Q1: What is the complete lifecycle of a peer connection, from discovery to disconnect?
Network stack overview
Geth’s P2P network consists of two independent channels:
UDP channel (node discovery) TCP channel (data transport)┌─────────────────────────┐ ┌──────────────────────────────┐│ discv4 / discv5 │ │ Sub-protocols (eth/68, snap)││ Kademlia DHT │ │ devp2p base protocol ││ PING/PONG/FINDNODE │ │ RLPx encrypted transport │└─────────────────────────┘ └──────────────────────────────┘UDP only handles “discover who’s online.” TCP handles “actual communication.” The two are completely independent — discovering a node doesn’t mean connecting to it, and connecting to a node doesn’t require discovering it (e.g., static nodes).
Complete lifecycle: 6 stages
Stage 1: Discovery (UDP) Routing table populated via PING/PONG/FINDNODE with candidate nodes │ ▼Stage 2: Dial (TCP) Dial scheduler picks candidates from FairMix, establishes TCP connection │ ▼Stage 3: RLPx encryption handshake ECIES key exchange → derive AES + MAC keys Checkpoint: main loop checks MaxPeers, duplicates, self-connection │ ▼Stage 4: devp2p protocol handshake Exchange capabilities (eth/68, snap/1, etc.) Checkpoint: main loop checks at least one matching sub-protocol │ ▼Stage 5: Message loop readLoop decrypts → routes (base protocol handled directly, sub-protocols dispatched to protoRW) pingLoop heartbeat every 15 seconds One goroutine per matched sub-protocol handles messages │ ▼Stage 6: Disconnect Network error / remote disconnect / protocol error / local shutdown → Peer.run() exits → notify server main loop → dial scheduler fills vacancyStages 1-2: Discovery and dialing
Discovery fills the routing table via UDP (see Q3). Multiple discovery sources (discv4, discv5, DNS, static nodes) are merged through FairMix into a single iterator.
The dial scheduler consumes this iterator and decides which nodes to dial based on available slots:
MaxPeers = 50, DialRatio = 3
Outbound slots = 50 / 3 = 16 (actively dialed)Inbound slots = 50 - 16 = 34 (waiting for others to connect)Static nodes are always dialed and automatically reconnected on disconnect. Dynamic nodes fill remaining slots.
Stage 3: RLPx encryption handshake
After TCP connection is established, SetupConn() starts the two-phase handshake. Phase 1 is RLPx encryption:
Initiator (dialer) Responder (listener) │ │ │ auth message (ECIES encrypted) │ │ { signature, initiator pubkey, │ │ nonce } │ │ ───────────────────────────────────→ │ │ │ Decrypt, verify signature │ auth-ack message (ECIES encrypted) │ │ { responder ephemeral pubkey, │ │ nonce } │ │ ←─────────────────────────────────── │ │ │ │ Both sides derive shared secrets: │ │ ecdheSecret → sharedSecret │ │ → aesSecret → MAC key │After handshake, the connection upgrades to an encrypted channel: AES-256-CTR encryption + Keccak-256 MAC authentication. All subsequent messages are sent as encrypted frames.
The connection then passes through the first checkpoint — sent via channel to the server’s main loop run(), where single-threaded checks occur:
// Check order:1. Not trusted and peer count full? → DiscTooManyPeers2. Not trusted and inbound count full? → DiscTooManyPeers3. Already connected to same node? → DiscAlreadyConnected4. Connected to self? → DiscSelfWhy route checks through the main loop instead of checking in place? Because peer counting must be serialized in a single thread — multiple concurrent handshakes each reading the peer map would create race conditions.
Stage 4: devp2p protocol handshake
With the encrypted channel established, both sides exchange protocol handshake messages:
type protoHandshake struct { Version uint64 Name string // "Geth/v1.16.7-stable/linux-amd64/go1.23.0" Caps []Cap // [{eth 68}, {snap 1}] ListenPort uint64 ID []byte // secp256k1 public key}Once both sides know each other’s supported protocols, the connection passes through the second checkpoint:
func addPeerChecks(...) error { // At least one matching sub-protocol? if countMatchingProtocols(srv.Protocols, c.caps) == 0 { return DiscUselessPeer // No common language, disconnect } // Re-check peer counts (may have changed between checkpoints) return srv.postHandshakeChecks(peers, inboundCount, c)}Stage 5: Message loop
After passing both checkpoints, launchPeer() creates a Peer and starts multiple goroutines:
Peer.run() ├─ readLoop() — reads from encrypted connection, routes by message code ├─ pingLoop() — sends PING every 15 seconds, replies PONG to received PINGs ├─ eth/68 Run() — Ethereum protocol handler goroutine └─ snap/1 Run() — snap sync protocol handler goroutinereadLoop’s routing logic after reading a message:
Message codes 0-15 → base protocol (ping/pong/disconnect) handled directlyMessage codes 16-32 → eth/68's protoRW.in channelMessage codes 33-40 → snap/1's protoRW.in channelStage 6: Disconnect
Any goroutine error triggers disconnect:
readLoop network error ──┐pingLoop timeout ──┤Protocol handler error ──┼──→ Peer.run() exitsRemote sends discMsg ──┤ │Local calls Disconnect ──┘ ▼ close(p.closed) → signal all goroutines to exit p.rw.close() → close TCP connection p.wg.Wait() → wait for all goroutines delpeer channel → notify server main loop │ ▼ Main loop: delete(peers, id) Dial scheduler: peerRemoved → fill vacancyQ2: How do multiple sub-protocols share a single TCP connection? (Message multiplexing)
The problem
Geth runs multiple protocols simultaneously — eth/68 (blocks and transactions), snap/1 (fast sync), etc. But each peer has only one TCP connection. How do multiple protocols’ messages coexist on the same connection without interference?
Message code segmentation
The core idea: give each protocol a non-overlapping range of message codes.
The devp2p base protocol reserves codes 0–15. Sub-protocols start from 16, allocated in match order:
Code range Protocol0 - 15 devp2p base protocol (handshake, ping, pong, disconnect)16 - 32 eth/68 (17 message codes)33 - 40 snap/1 (8 message codes)Allocation happens in matchProtocols():
func matchProtocols(protocols []Protocol, caps []Cap, rw MsgReadWriter) map[string]*protoRW { offset := baseProtocolLength // start at 16 result := make(map[string]*protoRW)
for _, cap := range caps { for _, proto := range protocols { if proto.Name == cap.Name && proto.Version == cap.Version { result[cap.Name] = &protoRW{ Protocol: proto, offset: offset, // this protocol's base code in: make(chan Msg), w: rw, } offset += proto.Length // next protocol starts here } } } return result}If both sides support multiple versions of the same protocol (e.g., eth/67 and eth/68), only the highest version is kept — the older version’s code range is reclaimed.
protoRW: transparent translation
Each sub-protocol handler reads and writes through protoRW, which automatically translates between “protocol-local codes” and “wire codes”:
eth/68 handler's view protoRW translation actual wire code code 0 (StatusMsg) → 0 + 16 = 16 → sends code 16 code 1 (NewBlockHashes) → 1 + 16 = 17 → sends code 17
snap/1 handler's view code 0 (GetAccountRange) → 0 + 33 = 33 → sends code 33Add offset on write, subtract on read:
// Write: protocol-local code → wire codefunc (rw *protoRW) WriteMsg(msg Msg) error { msg.Code += rw.offset // add offset // ...wait for write token, then write to connection}
// Read: wire code → protocol-local codefunc (rw *protoRW) ReadMsg() (Msg, error) { msg := <-rw.in msg.Code -= rw.offset // subtract offset return msg, nil}Protocol handlers have no idea what their wire codes are. eth/68’s StatusMsg is always code 0, regardless of whether it’s 16 on the wire or some other value. This completely decouples protocol implementation from protocol composition.
Read direction: readLoop routing
TCP connection → readLoop reads → message code 17 │ getProto(17): eth's range is [16, 33), 17 is in range │ ▼ eth's protoRW.in <- msg │ ▼ eth handler ReadMsg(): msg.Code = 17 - 16 = 1 (NewBlockHashesMsg)Write direction: serialization token
Multiple protocol goroutines run concurrently, but there’s physically only one TCP connection. If two goroutines write simultaneously, messages would interleave and corrupt on the wire.
The solution is a capacity-1 channel as a write token:
writeStart channel (capacity 1, initially holds one token)
eth goroutine wants to write: 1. <-writeStart // take the token (block if unavailable) 2. rw.WriteMsg() // write the message 3. werr <- err // report result 4. Peer.run() receives result, writeStart <- struct{}{} // return token
snap goroutine wants to write: 1. <-writeStart // wait for eth to finish and return the token 2. ...Only one goroutine can hold the token at a time, ensuring messages are written intact without using a mutex.
Concrete example
Suppose eth/68 and snap/1 are running simultaneously:
eth/68 goroutine: snap/1 goroutine: send NewBlockHashes (code 1) send GetAccountRange (code 0) → protoRW: 1 + 16 = 17 → protoRW: 0 + 33 = 33 → wait for write token... → wait for write token... → got token, write code 17 → waiting... → done, return token → got token, write code 33 → done, return token
Remote readLoop: reads code 17 → getProto(17) → eth → eth.in <- msg → eth sees code 1 reads code 33 → getProto(33) → snap → snap.in <- msg → snap sees code 0Q3: How does Kademlia DHT node discovery work?
The problem
Ethereum has no central server. When a new node starts, how does it find other nodes on the network? The answer is Kademlia Distributed Hash Table (DHT) — a fully decentralized node discovery mechanism.
Core concept: XOR distance
Kademlia defines “distance” between two nodes using XOR:
Node A's ID: 0x1010... (Keccak256 of public key)Node B's ID: 0x1001...
Distance = XOR(A, B) = 0x0011...This is not physical distance — two nodes in New York and Tokyo can be very “close” in XOR space. XOR distance has two important properties:
- Symmetry:
XOR(A, B) = XOR(B, A)— distance from A to B equals distance from B to A - Triangle inequality:
XOR(A, C) ≤ XOR(A, B) + XOR(B, C)
Distance level is expressed as log2(XOR). The more leading bits two IDs share, the closer they are.
Routing table: bucket organization
The routing table is divided into buckets by distance level:
Bucket 0: Closest nodes (most shared prefix bits)Bucket 1: Slightly farther nodes...Bucket 16: Farthest nodes (almost completely different prefix)
Each bucket: entries[16] — up to 16 active nodes, most recently used first replacements[10] — up to 10 replacement candidatesKey design: know more detail about nearby regions, only need a few representatives for distant regions. Like real life — you know every building in your neighborhood, but only a few landmarks in another city.
Defense: IP limits
To prevent Sybil attacks (attacker generating massive fake nodes), the routing table has IP limits:
Per bucket: max 2 nodes from the same /24 subnetWhole table: max 10 nodes from the same /24 subnetEven if an attacker controls an entire /24 subnet (256 IPs), they can only occupy limited positions in the routing table.
Four message types
Discovery v4 has 4 core messages (plus 2 EIP-868 extensions):
PING → "Are you alive?"PONG ← "I am, and your external IP is X" (helps NAT discovery)
FINDNODE → "Give me nodes closest to this target"NEIGHBORS ← "Here are the closest 12 nodes"Each packet carries an ECDSA signature. The receiver can recover the sender’s public key (i.e., node ID) from the signature, without prior key exchange.
Lookup algorithm: finding specific nodes
Lookup is Kademlia’s core operation — given a target ID, find the nodes closest to it in the network:
Goal: find nodes closest to target T
Round 1: Pick the 3 nodes (alpha=3) closest to T from own routing table Send FINDNODE(T) to them concurrently
Node A replies: [X, Y, Z] (its known nodes closest to T) Node B replies: [Y, W, V] Node C replies: [W, U, S]
Round 2: Merge all results, sort by XOR distance to T Pick the 3 closest unqueried nodes Send FINDNODE(T) concurrently
These nodes are closer to T, so their neighbors are even closer to T → replies contain closer nodes
Round 3: Continue...
Convergence: When a round discovers no nodes closer than the best known, algorithm terminatesWhy does it converge? Each round queries nodes closer to the target than the previous round. In XOR space, nodes closer to the target know more detail about the target’s neighborhood (due to the routing table’s bucket design). So each round discovers closer nodes, until no closer ones exist.
Bootstrap process
When a new node starts for the first time, the routing table is empty:
1. Start from hardcoded bootnodes MainnetBootnodes = [ "enode://d860a01f...@18.138.108.67:30303", "enode://22a8232c...@3.209.45.79:30303", ... ]
2. PING bootnode → receive PONG → bootnode enters routing table
3. Execute lookup(own ID) → send FINDNODE(myID) to bootnode → bootnode replies with nodes closest to myID → these nodes enter routing table → continue lookup, routing table fills rapidly
4. Node database (enode.DB) persists to disk → next restart doesn't need to bootstrap from scratchPerforming a lookup on your own ID is a clever design — it fills buckets across all distance ranges, because the query path passes through nodes at various distances.
Discovery v5 improvements
v5 adds to v4:
| v4 | v5 | |
|---|---|---|
| Authentication | ECDSA signature per packet (65 bytes) | Session keys after initial WHOAREYOU challenge |
| Node info | Basic IP + port | Full ENR records (extensible key-value pairs) |
| Targeted discovery | Not supported | Topic advertisement (e.g., “find light client servers”) |
Both versions can run simultaneously. FairMix merges their results into a single candidate node stream for the dial scheduler.
Some information may be outdated






