Geth(11) P2P Networking and Discovery

Every subsystem covered so far — block insertion, state management, the EVM — operates on a single node. But Ethereum is a distributed system: nodes must find each other, establish encrypted connections, and exchange protocol messages. This chapter covers geth’s networking stack, from discovering peers on the internet to multiplexing sub-protocol messages over encrypted TCP connections.

The Networking Stack at a Glance#

Geth’s P2P layer is built from three distinct systems, each operating at a different level:

1
 +-------------------------------------------------------+
2
 |  Sub-protocols (eth/68, snap/1, ...)                  |  Application
3
 |    Each runs in its own goroutine per peer            |
4
 +-------------------------------------------------------+
5
 |  devp2p base protocol                                 |  Session
6
 |    Handshake, capability negotiation, ping/pong       |
7
 |    Message multiplexing across sub-protocols          |
8
 +-------------------------------------------------------+
9
 |  RLPx encrypted transport                             |  Transport
10
 |    ECIES handshake -> AES-CTR + Keccak-256 MAC        |
11
 |    Snappy compression, framed messages                |
12
 +-------------------------------------------------------+
13
 |  TCP connection                                       |  Network
14
 +-------------------------------------------------------+
15

16
 +-------------------------------------------------------+
17
 |  Node discovery (discv4 / discv5)                     |  UDP
18
 |    Kademlia DHT, ping/pong, findnode/neighbors        |
19
 +-------------------------------------------------------+

The lifecycle of a peer connection follows these steps:

Discovery — find nodes via UDP-based Kademlia DHT (discv4/v5)
TCP dial — connect to a candidate node’s TCP port
RLPx encryption handshake — establish shared secrets via ECIES
devp2p protocol handshake — exchange capabilities (supported protocols)
Protocol dispatch — launch a goroutine per matched sub-protocol
Message loop — read, decrypt, route messages until disconnect

The Server: Managing All Connections#

The Server struct in p2p/server.go is the top-level P2P manager. It owns the TCP listener, discovery protocols, dial scheduler, and all active peer connections:

1
type Server struct {
2
    Config  // Embedded configuration
3

4
    lock    sync.Mutex
5
    running bool
6

7
    listener     net.Listener
8
    ourHandshake *protoHandshake
9
    loopWG       sync.WaitGroup
10
    peerFeed     event.Feed
11
    log          log.Logger
12

13
    nodedb    *enode.DB
14
    localnode *enode.LocalNode
15
    discv4    *discover.UDPv4
16
    discv5    *discover.UDPv5
17
    discmix   *enode.FairMix
18
    dialsched *dialScheduler
19

20
    portMappingRegister chan *portMapping
21

22
    quit                    chan struct{}
23
    addtrusted              chan *enode.Node
24
    removetrusted           chan *enode.Node
25
    peerOp                  chan peerOpFunc
26
    peerOpDone              chan struct{}
27
    delpeer                 chan peerDrop
28
    checkpointPostHandshake chan *conn
29
    checkpointAddPeer       chan *conn
30

31
    inboundHistory expHeap
32
}

Key fields:

Config — embedded configuration (max peers, protocols, bootnodes, etc.).
discv4 / discv5 — UDP-based node discovery protocols.
discmix — a FairMix that merges discovery results from multiple sources (v4, v5, DNS, static nodes) into a single iterator for the dial scheduler.
dialsched — decides which discovered nodes to dial and when.
checkpointPostHandshake / checkpointAddPeer — channels that gate peer acceptance through the main loop, ensuring all peer-count checks happen in a single goroutine.

Configuration#

The Config struct controls all P2P behavior:

1
type Config struct {
2
    PrivateKey *ecdsa.PrivateKey
3

4
    MaxPeers        int           // Maximum total connections
5
    MaxPendingPeers int           // Max concurrent handshakes (default 50)
6
    DialRatio       int           // Ratio of inbound to dialed (default 3)
7

8
    NoDiscovery bool
9
    DiscoveryV4 bool
10
    DiscoveryV5 bool
11

12
    Name             string
13
    BootstrapNodes   []*enode.Node   // Discv4 bootstrap
14
    BootstrapNodesV5 []*enode.Node   // Discv5 bootstrap
15
    StaticNodes      []*enode.Node   // Always-connected peers
16
    TrustedNodes     []*enode.Node   // Peers that bypass limits
17

18
    NetRestrict    *netutil.Netlist  // IP whitelist
19
    NodeDatabase   string            // Path to node DB
20
    Protocols      []Protocol        // Supported sub-protocols
21

22
    ListenAddr string
23
    DiscAddr   string
24
    NAT        nat.Interface
25
    Dialer     NodeDialer
26
    NoDial     bool
27

28
    EnableMsgEvents bool
29
    Logger          log.Logger
30
    clock           mclock.Clock
31
}

The most important settings:

MaxPeers — the hard cap on simultaneous peer connections (default 50 for mainnet). Trusted peers bypass this limit.
DialRatio — controls the inbound/outbound split. With the default ratio of 3, at most MaxPeers / 3 slots are used for outbound dials, and the rest accept inbound connections. This ensures the node is reachable by others.
Protocols — the list of sub-protocols the node supports (e.g., eth/68, snap/1). Only peers that share at least one protocol are accepted.
StaticNodes — peers that are always dialed and reconnected on disconnect. Used for operator-controlled peering.
TrustedNodes — peers that are always accepted even when MaxPeers is reached.

Connection Limits#

The server splits its peer budget between dialed (outbound) and inbound connections:

1
func (srv *Server) MaxDialedConns() (limit int) {
2
    if srv.NoDial || srv.MaxPeers == 0 {
3
        return 0
4
    }
5
    if srv.DialRatio == 0 {
6
        limit = srv.MaxPeers / defaultDialRatio  // default: MaxPeers / 3
7
    } else {
8
        limit = srv.MaxPeers / srv.DialRatio
9
    }
10
    if limit == 0 {
11
        limit = 1
12
    }
13
    return limit
14
}
15

16
func (srv *Server) MaxInboundConns() int {
17
    return srv.MaxPeers - srv.MaxDialedConns()
18
}

With 50 max peers and ratio 3: up to 16 dialed, up to 34 inbound.

Server Startup#

Start() initializes every component in sequence:

1
// p2p/server.go  (simplified)
2

3
func (srv *Server) Start() error {
4
    srv.lock.Lock()
5
    defer srv.lock.Unlock()
6
    if srv.running {
7
        return errors.New("server already running")
8
    }
9
    srv.running = true
10

11
    // Initialize channels
12
    srv.quit = make(chan struct{})
13
    srv.delpeer = make(chan peerDrop)
14
    srv.checkpointPostHandshake = make(chan *conn)
15
    srv.checkpointAddPeer = make(chan *conn)
16
    // ...
17

18
    // 1. Set up local node identity and devp2p handshake
19
    srv.setupLocalNode()
20

21
    // 2. Register NAT port mappings
22
    srv.setupPortMapping()
23

24
    // 3. Start TCP listener for inbound connections
25
    if srv.ListenAddr != "" {
26
        srv.setupListening()
27
    }
28

29
    // 4. Start UDP discovery (v4 and/or v5)
30
    srv.setupDiscovery()
31

32
    // 5. Start the dial scheduler
33
    srv.setupDialScheduler()
34

35
    // 6. Launch the main event loop
36
    go srv.run()
37
    return nil
38
}

The dial scheduler receives candidate nodes from discmix (which combines all discovery sources) and decides which to dial based on available slots:

1
func (srv *Server) setupDialScheduler() {
2
    // ...
3
    srv.dialsched = newDialScheduler(config, srv.discmix, srv.SetupConn)
4
    for _, n := range srv.StaticNodes {
5
        srv.dialsched.addStatic(n)
6
    }
7
}

Static nodes are added immediately. The scheduler will continuously attempt to connect to them.

The Main Loop#

The run() method is the server’s single-threaded event loop. All peer-set mutations happen here, eliminating the need for complex locking:

1
// p2p/server.go  (simplified)
2

3
func (srv *Server) run() {
4
    var (
5
        peers        = make(map[enode.ID]*Peer)
6
        inboundCount = 0
7
        trusted      = make(map[enode.ID]bool, len(srv.TrustedNodes))
8
    )
9
    for _, n := range srv.TrustedNodes {
10
        trusted[n.ID()] = true
11
    }
12

13
    for {
14
        select {
15
        case <-srv.quit:
16
            // Disconnect all peers, wait for them to shut down
17
            for _, p := range peers {
18
                p.Disconnect(DiscQuitting)
19
            }
20
            // ...
21
            return
22

23
        case n := <-srv.addtrusted:
24
            trusted[n.ID()] = true
25

26
        case n := <-srv.removetrusted:
27
            delete(trusted, n.ID())
28

29
        case op := <-srv.peerOp:
30
            op(peers) // Used by Peers(), PeerCount()
31
            srv.peerOpDone <- struct{}{}
32

33
        case c := <-srv.checkpointPostHandshake:
34
            if trusted[c.node.ID()] {
35
                c.flags |= trustedConn
36
            }
37
            c.cont <- srv.postHandshakeChecks(peers, inboundCount, c)
38

39
        case c := <-srv.checkpointAddPeer:
40
            err := srv.addPeerChecks(peers, inboundCount, c)
41
            if err == nil {
42
                p := srv.launchPeer(c)
43
                peers[c.node.ID()] = p
44
                srv.dialsched.peerAdded(c)
45
                if p.Inbound() {
46
                    inboundCount++
47
                }
48
            }
49
            c.cont <- err
50

51
        case pd := <-srv.delpeer:
52
            delete(peers, pd.ID())
53
            srv.dialsched.peerRemoved(pd.rw)
54
            if pd.Inbound() {
55
                inboundCount--
56
            }
57
        }
58
    }
59
}

The loop handles six types of events:

Shutdown — disconnect all peers, close discovery, wait for cleanup.
Trust management — addtrusted / removetrusted modify the trusted set. A peer already connected can be upgraded to trusted status.
Peer queries — peerOp allows Peers() and PeerCount() to safely read the peer map.
Post-handshake checkpoint — after the RLPx encryption handshake, the connection is checked against MaxPeers, MaxInboundConns, duplicate connections, and self-connections.
Add-peer checkpoint — after the protocol handshake, checks that at least one sub-protocol matches, then launches the peer.
Peer removal — updates peer count and notifies the dial scheduler so it can fill the vacant slot.

Peer Validation#

Two validation steps gate every new connection:

1
func (srv *Server) postHandshakeChecks(peers map[enode.ID]*Peer, inboundCount int, c *conn) error {
2
    switch {
3
    case !c.is(trustedConn) && len(peers) >= srv.MaxPeers:
4
        return DiscTooManyPeers
5
    case !c.is(trustedConn) && c.is(inboundConn) && inboundCount >= srv.MaxInboundConns():
6
        return DiscTooManyPeers
7
    case peers[c.node.ID()] != nil:
8
        return DiscAlreadyConnected
9
    case c.node.ID() == srv.localnode.ID():
10
        return DiscSelf
11
    default:
12
        return nil
13
    }
14
}
15

16
func (srv *Server) addPeerChecks(peers map[enode.ID]*Peer, inboundCount int, c *conn) error {
17
    if len(srv.Protocols) > 0 && countMatchingProtocols(srv.Protocols, c.caps) == 0 {
18
        return DiscUselessPeer
19
    }
20
    return srv.postHandshakeChecks(peers, inboundCount, c)
21
}

Post-handshake checks run after the encryption handshake reveals the remote identity. Add-peer checks run after the protocol handshake reveals capabilities. The post-handshake checks are repeated in addPeerChecks because the peer set may have changed between the two checkpoints.

Connection Establishment#

When a TCP connection is established (either outbound dial or inbound accept), SetupConn() runs the two-phase handshake:

1
func (srv *Server) SetupConn(fd net.Conn, flags connFlag, dialDest *enode.Node) error {
2
    c := &conn{fd: fd, flags: flags, cont: make(chan error)}
3
    if dialDest == nil {
4
        c.transport = srv.newTransport(fd, nil)           // inbound: no remote key yet
5
    } else {
6
        c.transport = srv.newTransport(fd, dialDest.Pubkey()) // outbound: know remote key
7
    }
8
    err := srv.setupConn(c, dialDest)
9
    if err != nil {
10
        c.close(err)
11
    }
12
    return err
13
}

The internal setupConn() runs both handshakes:

1
// p2p/server.go  (simplified)
2

3
func (srv *Server) setupConn(c *conn, dialDest *enode.Node) error {
4
    // Phase 1: RLPx encryption handshake
5
    remotePubkey, err := c.doEncHandshake(srv.PrivateKey)
6
    if err != nil {
7
        return fmt.Errorf("%w: %v", errEncHandshakeError, err)
8
    }
9
    if dialDest != nil {
10
        c.node = dialDest
11
    } else {
12
        c.node = nodeFromConn(remotePubkey, c.fd)
13
    }
14
    // Checkpoint: validate against peer limits
15
    err = srv.checkpoint(c, srv.checkpointPostHandshake)
16
    if err != nil {
17
        return err
18
    }
19

20
    // Phase 2: devp2p protocol handshake
21
    phs, err := c.doProtoHandshake(srv.ourHandshake)
22
    if err != nil {
23
        return &protoHandshakeError{err: err}
24
    }
25
    // Verify identity: Keccak256(pubkey) must match node ID
26
    if id := c.node.ID(); !bytes.Equal(crypto.Keccak256(phs.ID), id[:]) {
27
        return DiscUnexpectedIdentity
28
    }
29
    c.caps, c.name = phs.Caps, phs.Name
30

31
    // Checkpoint: validate protocol match
32
    err = srv.checkpoint(c, srv.checkpointAddPeer)
33
    return err
34
}

The checkpoint() calls send the conn to the main loop via a channel and block until the main loop responds on c.cont. This ensures all peer-count checks are serialized in the single-threaded run() loop.

Connection Types#

Every connection is tagged with flags indicating how it was established:

1
const (
2
    dynDialedConn    connFlag = 1 << iota // Discovered via DHT, dialed dynamically
3
    staticDialedConn                       // Static node, always reconnected
4
    inboundConn                            // Remote peer initiated the connection
5
    trustedConn                            // Bypasses MaxPeers limit
6
)

A connection can have multiple flags — for example, a static node that is also trusted.

RLPx: The Encrypted Transport#

Every TCP connection is wrapped in the RLPx transport (p2p/rlpx/rlpx.go), which provides authenticated encryption. The protocol has two phases: an ECIES handshake that establishes shared secrets, and a framed message protocol that encrypts all subsequent traffic.

The Handshake#

The RLPx handshake (defined in EIP-8) uses Elliptic Curve Integrated Encryption Scheme (ECIES) to establish a shared secret:

1
 Initiator (dialer)                     Responder (listener)
2
       |                                        |
3
       |  auth message (ECIES-encrypted)        |
4
       |  { signature, initiator-pubkey,        |
5
       |    nonce, version }                    |
6
       | -------------------------------------> |
7
       |                                        |  Decrypt, verify signature
8
       |  auth-ack message (ECIES-encrypted)    |
9
       |  { responder-ephemeral-pubkey,         |
10
       |    nonce, version }                    |
11
       | <------------------------------------- |
12
       |                                        |
13
       |  Both sides derive shared secrets      |
14
       |  from ECDHE key agreement              |

The handshake state tracks the ephemeral keys and nonces:

1
type handshakeState struct {
2
    initiator            bool
3
    remote               *ecies.PublicKey
4
    initNonce, respNonce []byte
5
    randomPrivKey        *ecies.PrivateKey   // ephemeral ECDHE key
6
    remoteRandomPub      *ecies.PublicKey     // remote ephemeral key
7
    // ...
8
}

Each side generates a random ephemeral key pair for ECDHE (Elliptic Curve Diffie-Hellman Ephemeral). The auth message contains:

1
type authMsgV4 struct {
2
    Signature       [65]byte   // ECDSA signature
3
    InitiatorPubkey [64]byte   // Static public key
4
    Nonce           [32]byte   // Random nonce
5
    Version         uint
6
    Rest []rlp.RawValue `rlp:"tail"` // Forward-compatibility
7
}

After both messages are exchanged, shared secrets are derived:

1
func (h *handshakeState) secrets(auth, authResp []byte) (Secrets, error) {
2
    ecdheSecret, err := h.randomPrivKey.GenerateShared(h.remoteRandomPub, sskLen, sskLen)
3

4
    sharedSecret := crypto.Keccak256(ecdheSecret, crypto.Keccak256(h.respNonce, h.initNonce))
5
    aesSecret := crypto.Keccak256(ecdheSecret, sharedSecret)
6
    s := Secrets{
7
        remote: h.remote.ExportECDSA(),
8
        AES:    aesSecret,
9
        MAC:    crypto.Keccak256(ecdheSecret, aesSecret),
10
    }
11

12
    // Setup MAC states (direction depends on initiator/responder role)
13
    mac1 := sha3.NewLegacyKeccak256()
14
    mac1.Write(xor(s.MAC, h.respNonce))
15
    mac1.Write(auth)
16
    mac2 := sha3.NewLegacyKeccak256()
17
    mac2.Write(xor(s.MAC, h.initNonce))
18
    mac2.Write(authResp)
19

20
    if h.initiator {
21
        s.EgressMAC, s.IngressMAC = mac1, mac2
22
    } else {
23
        s.EgressMAC, s.IngressMAC = mac2, mac1
24
    }
25
    return s, nil
26
}

The derivation chain is: ecdheSecret → sharedSecret → aesSecret → MAC key. Two separate Keccak-256 MAC states are initialized for each direction, seeded with the nonces and the raw handshake packets. This means each direction has its own MAC chain, and the handshake messages themselves are mixed into the MAC state for authentication.

Frame Encryption#

After the handshake, all messages are sent as encrypted frames:

1
type Conn struct {
2
    dialDest *ecdsa.PublicKey
3
    conn     net.Conn
4
    session  *sessionState
5

6
    snappyReadBuffer  []byte
7
    snappyWriteBuffer []byte
8
}
9

10
type sessionState struct {
11
    enc        cipher.Stream    // AES-256-CTR for encryption
12
    dec        cipher.Stream    // AES-256-CTR for decryption
13
    egressMAC  hashMAC          // Outgoing MAC state (Keccak-256 + AES-128)
14
    ingressMAC hashMAC          // Incoming MAC state
15
    rbuf       readBuffer
16
    wbuf       writeBuffer
17
}

Each frame has this wire format:

1
 Header (32 bytes):
2
   [frame-size: 3 bytes] [header-data: 13 bytes]  ← AES-CTR encrypted
3
   [header-MAC: 16 bytes]                          ← Keccak-256 MAC
4

5
 Body (variable):
6
   [frame-data: padded to 16-byte boundary]        ← AES-CTR encrypted
7
   [frame-MAC: 16 bytes]                           ← Keccak-256 MAC

The body contains the RLP-encoded message code followed by the payload. If Snappy compression is enabled (which it is after the protocol handshake), the payload is compressed before encryption.

The devp2p Base Protocol#

On top of the encrypted RLPx transport, the devp2p base protocol provides session management. It reserves message codes 0–15 for its own use:

1
const (
2
    baseProtocolVersion    = 5
3
    baseProtocolLength     = 16    // Codes 0-15 reserved
4

5
    handshakeMsg = 0x00
6
    discMsg      = 0x01
7
    pingMsg      = 0x02
8
    pongMsg      = 0x03
9
)

The protocol handshake (handshakeMsg) is the first message exchanged after encryption. It carries the node’s identity and supported capabilities:

1
// p2p/peer.go (protoHandshake type defined in message.go)
2

3
type protoHandshake struct {
4
    Version    uint64
5
    Name       string    // e.g. "Geth/v1.16.7-stable/linux-amd64/go1.23.0"
6
    Caps       []Cap     // e.g. [{eth 68}, {snap 1}]
7
    ListenPort uint64
8
    ID         []byte    // secp256k1 public key (64 bytes)
9
}

After exchanging handshakes, both sides know each other’s capabilities. Only protocols that both sides support are activated.

The Peer: Message Multiplexing#

Once the handshakes complete and validation passes, the server creates a Peer and launches its run loop:

1
type Peer struct {
2
    rw      *conn
3
    running map[string]*protoRW  // Active sub-protocols by name
4
    log     log.Logger
5
    created mclock.AbsTime
6

7
    wg       sync.WaitGroup
8
    protoErr chan error
9
    closed   chan struct{}
10
    pingRecv chan struct{}
11
    disc     chan DiscReason
12

13
    events   *event.Feed
14
    // ...
15
}

Protocol Matching#

Before launching the peer, matchProtocols() aligns local and remote capabilities:

1
func matchProtocols(protocols []Protocol, caps []Cap, rw MsgReadWriter) map[string]*protoRW {
2
    slices.SortFunc(caps, Cap.Cmp)
3
    offset := baseProtocolLength  // Start at 16 (after base protocol codes)
4
    result := make(map[string]*protoRW)
5

6
    for _, cap := range caps {
7
        for _, proto := range protocols {
8
            if proto.Name == cap.Name && proto.Version == cap.Version {
9
                if old := result[cap.Name]; old != nil {
10
                    offset -= old.Length  // Replace older version
11
                }
12
                result[cap.Name] = &protoRW{
13
                    Protocol: proto,
14
                    offset:   offset,
15
                    in:       make(chan Msg),
16
                    w:        rw,
17
                }
18
                offset += proto.Length
19
                continue
20
            }
21
        }
22
    }
23
    return result
24
}

Each matched protocol gets a contiguous range of message codes starting from offset 16. For example, if eth/68 uses 17 message codes and snap/1 uses 8, then eth gets codes 16–32 and snap gets 33–40. If both sides support multiple versions of the same protocol, only the highest matching version is used.

The Peer Run Loop#

Peer.run() coordinates all goroutines for a single peer connection:

1
// p2p/peer.go  (simplified)
2

3
func (p *Peer) run() (remoteRequested bool, err error) {
4
    var (
5
        writeStart = make(chan struct{}, 1)
6
        writeErr   = make(chan error, 1)
7
        readErr    = make(chan error, 1)
8
    )
9
    p.wg.Add(2)
10
    go p.readLoop(readErr)
11
    go p.pingLoop()
12

13
    // Allow the first write
14
    writeStart <- struct{}{}
15
    p.startProtocols(writeStart, writeErr)
16

17
    // Wait for any error
18
    for {
19
        select {
20
        case err = <-writeErr:
21
            if err != nil {
22
                break // network error
23
            }
24
            writeStart <- struct{}{} // allow next write
25
        case err = <-readErr:
26
            break // read error or remote disconnect
27
        case err = <-p.protoErr:
28
            break // protocol handler error
29
        case err = <-p.disc:
30
            break // local disconnect request
31
        }
32
    }
33

34
    close(p.closed)
35
    p.rw.close(reason)
36
    p.wg.Wait()
37
    return remoteRequested, err
38
}

Three types of goroutines run concurrently:

readLoop() — reads messages from the encrypted connection and dispatches them.
pingLoop() — sends a PING every 15 seconds and responds to incoming PINGs with PONGs.
One goroutine per protocol — each sub-protocol’s Run() function executes in its own goroutine.

The write serialization mechanism is key: writeStart is a channel with capacity 1 that acts as a token. Only one goroutine can write at a time. After a write completes, the token is returned to writeStart so the next goroutine can write. This prevents message interleaving on the wire without requiring a mutex.

Message Dispatch#

The readLoop() reads raw messages and routes them:

1
func (p *Peer) readLoop(errc chan<- error) {
2
    defer p.wg.Done()
3
    for {
4
        msg, err := p.rw.ReadMsg()
5
        if err != nil {
6
            errc <- err
7
            return
8
        }
9
        msg.ReceivedAt = time.Now()
10
        if err = p.handle(msg); err != nil {
11
            errc <- err
12
            return
13
        }
14
    }
15
}
16

17
func (p *Peer) handle(msg Msg) error {
18
    switch {
19
    case msg.Code == pingMsg:
20
        msg.Discard()
21
        p.pingRecv <- struct{}{}
22
    case msg.Code == discMsg:
23
        return decodeDisconnectMessage(msg.Payload)
24
    case msg.Code < baseProtocolLength:
25
        return msg.Discard()  // ignore other base protocol msgs
26
    default:
27
        // Sub-protocol message: find the owning protocol by code range
28
        proto, err := p.getProto(msg.Code)
29
        if err != nil {
30
            return fmt.Errorf("msg code out of range: %v", msg.Code)
31
        }
32
        proto.in <- msg  // deliver to protocol handler
33
    }
34
    return nil
35
}

Base protocol messages (codes 0–15) are handled directly. Sub-protocol messages are routed to the appropriate protoRW.in channel based on which protocol’s code range the message code falls in.

The protoRW Wrapper#

Each sub-protocol handler reads and writes through a protoRW that transparently translates between protocol-local message codes and wire codes:

1
type protoRW struct {
2
    Protocol
3
    in     chan Msg        // receives read messages
4
    closed <-chan struct{} // peer shutdown signal
5
    wstart <-chan struct{} // write serialization token
6
    werr   chan<- error    // write result
7
    offset uint64          // base message code offset
8
    w      MsgWriter
9
}
10

11
func (rw *protoRW) WriteMsg(msg Msg) error {
12
    if msg.Code >= rw.Length {
13
        return newPeerError(errInvalidMsgCode, "not handled")
14
    }
15
    msg.Code += rw.offset        // translate to wire code
16

17
    select {
18
    case <-rw.wstart:            // wait for write token
19
        err := rw.w.WriteMsg(msg)
20
        rw.werr <- err           // report result
21
        return err
22
    case <-rw.closed:
23
        return ErrShuttingDown
24
    }
25
}
26

27
func (rw *protoRW) ReadMsg() (Msg, error) {
28
    select {
29
    case msg := <-rw.in:
30
        msg.Code -= rw.offset    // translate from wire code
31
        return msg, nil
32
    case <-rw.closed:
33
        return Msg{}, io.EOF
34
    }
35
}

The protocol handler sees clean message codes starting from 0, without knowing about the wire-level offset. This allows protocols to be composed without code conflicts.

The Protocol Struct#

Sub-protocols are registered with the server via the Protocol struct:

1
type Protocol struct {
2
    Name    string    // Protocol name (e.g. "eth")
3
    Version uint      // Protocol version (e.g. 68)
4
    Length  uint64    // Number of message codes used
5

6
    Run func(peer *Peer, rw MsgReadWriter) error  // Handler function
7

8
    NodeInfo func() interface{}                    // Optional: local node info
9
    PeerInfo func(id enode.ID) interface{}          // Optional: per-peer info
10

11
    DialCandidates enode.Iterator                  // Optional: protocol-specific discovery
12
    Attributes     []enr.Entry                     // Optional: ENR attributes
13
}

The Run function is called in a new goroutine when the protocol is negotiated with a peer. It should read and write messages via rw until the connection closes. The Ethereum wire protocol (eth/68) and the snap sync protocol (snap/1) are both registered as Protocol instances — their implementation is covered in Chapter 12.

Capabilities are advertised during the protocol handshake as name/version pairs:

1
type Cap struct {
2
    Name    string
3
    Version uint
4
}

Node Discovery#

Before geth can connect to peers, it must find them. Discovery runs over UDP, separate from the TCP-based RLPx connections, using a Kademlia-like distributed hash table.

Node Identity#

Every node is identified by a Node struct:

1
type Node struct {
2
    r        enr.Record     // Signed Ethereum Node Record
3
    id       ID             // 32-byte Keccak-256 of secp256k1 public key
4
    hostname string         // Optional DNS name
5

6
    ip  netip.Addr          // Chosen IP address
7
    udp uint16              // UDP port (discovery)
8
    tcp uint16              // TCP port (RLPx)
9
}

The ID type is a 32-byte hash ([32]byte), computed as Keccak256(secp256k1_pubkey). Nodes are addressable via enode URLs:

1
enode://<hex-pubkey>@<ip>:<tcp-port>

For example:

1
enode://d860a01f...db1f666@18.138.108.67:30303

The 128-character hex string is the uncompressed secp256k1 public key (64 bytes). The Ethereum Node Record (ENR), defined in EIP-778, provides a more extensible format: a signed, versioned key-value record that can carry IP addresses, ports, and protocol-specific attributes.

Discovery v4: Kademlia DHT#

The primary discovery protocol (p2p/discover/v4_udp.go) implements a simplified Kademlia DHT over UDP:

1
type UDPv4 struct {
2
    conn        UDPConn
3
    priv        *ecdsa.PrivateKey
4
    localNode   *enode.LocalNode
5
    db          *enode.DB
6
    tab         *Table                  // Kademlia routing table
7
    // ...
8
    addReplyMatcher chan *replyMatcher
9
    gotreply        chan reply
10
}

Four packet types form the core protocol:

1
const (
2
    PingPacket      = iota + 1  // 1
3
    PongPacket                   // 2
4
    FindnodePacket               // 3
5
    NeighborsPacket              // 4
6
    ENRRequestPacket             // 5 (EIP-868)
7
    ENRResponsePacket            // 6 (EIP-868)
8
)

Packet	Purpose
PING	Verify a node is alive. Contains sender/recipient endpoints and an ENR sequence number.
PONG	Reply to PING. Echoes the sender’s observed IP (used for NAT discovery).
FINDNODE	Request nodes closest to a target public key.
NEIGHBORS	Reply to FINDNODE. Contains up to 12 node records.
ENRRequest/ENRResponse	Request/return a node’s full ENR record (EIP-868 extension).

Each discovery packet has this wire format:

1
 [32 bytes]  MAC        = Keccak-256(signature || packet-type || packet-data)
2
 [65 bytes]  Signature  = ECDSA signature over (packet-type || packet-data)
3
 [1 byte]    Packet type
4
 [variable]  RLP-encoded packet data

The MAC provides integrity checking. The ECDSA signature authenticates the sender and allows the recipient to recover the sender’s public key (and thus node ID) without a prior key exchange.

The Kademlia Routing Table#

The routing table (p2p/discover/table.go) organizes known nodes by their distance from the local node:

1
const (
2
    alpha           = 3     // Concurrency factor
3
    bucketSize      = 16    // Max nodes per bucket
4
    maxReplacements = 10    // Replacement candidates per bucket
5
    hashBits        = 256   // Bits in node ID
6
    nBuckets        = 17    // hashBits / 15
7
)
8

9
type Table struct {
10
    mutex   sync.Mutex
11
    buckets [nBuckets]*bucket    // Indexed by log-distance
12
    nursery []*enode.Node        // Bootstrap nodes
13
    rand    reseedingRandom
14
    ips     netutil.DistinctNetSet
15
    db      *enode.DB
16
    net     transport
17
    // ...
18
}
19

20
type bucket struct {
21
    entries      []*tableNode      // Live entries, MRU first
22
    replacements []*tableNode      // Replacement candidates
23
    ips          netutil.DistinctNetSet
24
    index        int
25
}

Distance is measured as the XOR of two node IDs, expressed as log2(XOR(a, b)). Nodes that are “closer” (in XOR space) to the local node fill lower-numbered buckets. Each bucket holds up to 16 nodes plus 10 replacement candidates.

IP address limits prevent Sybil attacks: at most 2 nodes from the same /24 subnet per bucket, and at most 10 per the entire table.

The Lookup Algorithm#

A lookup finds the nodes closest to a target key. It is the core operation of the Kademlia DHT:

Pick the alpha (3) closest known nodes to the target from the routing table.
Send FINDNODE to those nodes concurrently.
Each FINDNODE returns up to 12 neighbors.
Add the new neighbors to a result set, sorted by distance to target.
Pick the next alpha closest nodes that haven’t been queried yet.
Repeat until no closer nodes are found.

The lookup converges because each round discovers nodes that are progressively closer to the target in XOR space.

Bootstrapping#

On first startup, the routing table is empty. Geth bootstraps by contacting bootnodes — hardcoded, well-known nodes run by the Ethereum Foundation and other organizations:

1
var MainnetBootnodes = []string{
2
    "enode://d860a01f...db1f666@18.138.108.67:30303",
3
    "enode://22a8232c...a68d4de@3.209.45.79:30303",
4
    "enode://2b252ab6...e6ffc@65.108.70.101:30303",
5
    "enode://4aeb4ab6...82052@157.90.35.166:30303",
6
}

The bootnodes are added to the table’s nursery. When the table needs to be populated (on startup, or when too many nodes go offline), a refresh() operation performs a lookup for the local node’s own ID — this fills the routing table with nodes from all distance ranges.

The node database (enode.DB, stored on disk) persists known nodes across restarts, so a node that has been running before does not need to re-bootstrap from scratch.

Discovery v5#

Discovery v5 (p2p/discover/v5_udp.go) is a newer protocol that adds:

Session-based encryption — after an initial WHOAREYOU challenge, messages are encrypted with session keys, unlike v4 where every packet carries a full ECDSA signature.
ENR as first-class records — nodes advertise their capabilities via ENR attributes, enabling protocol-specific peer filtering.
Topic advertisement — nodes can register interest in topics, allowing targeted peer discovery (e.g., finding light client servers).

Both v4 and v5 can run simultaneously. The FairMix in the server merges results from both into a single stream of candidate nodes for the dial scheduler.

NAT Traversal#

Nodes behind Network Address Translation (NAT) need help making their listening port reachable. The nat.Interface in p2p/nat/nat.go provides an abstraction:

1
type Interface interface {
2
    AddMapping(protocol string, extport, intport int,
3
               name string, lifetime time.Duration) (uint16, error)
4
    DeleteMapping(protocol string, extport, intport int) error
5
    ExternalIP() (net.IP, error)
6
    String() string
7
}

Supported mechanisms:

Mechanism	Description
`"none"`	No NAT traversal
`"extip:IP"`	Assume reachable on the given external IP
`"upnp"`	Universal Plug and Play — queries the router for port mappings
`"pmp"`	NAT-PMP — a simpler alternative to UPnP, common on Apple routers
`"stun"`	STUN — discovers external IP via a STUN server
`"any"`	Auto-detect the first available mechanism

During startup, the server calls setupPortMapping() which registers both TCP (RLPx) and UDP (discovery) ports with the NAT gateway. The mapping is periodically refreshed to keep it alive.

Additionally, the PONG response in discovery v4 echoes the sender’s observed IP address, providing a second mechanism for a node to discover its external address.

Inbound Connection Handling#

The listenLoop() accepts incoming TCP connections:

1
// p2p/server.go  (simplified)
2

3
func (srv *Server) listenLoop() {
4
    // Limit concurrent handshakes
5
    tokens := defaultMaxPendingPeers  // 50
6
    if srv.MaxPendingPeers > 0 {
7
        tokens = srv.MaxPendingPeers
8
    }
9
    slots := make(chan struct{}, tokens)
10
    for i := 0; i < tokens; i++ {
11
        slots <- struct{}{}
12
    }
13

14
    for {
15
        <-slots  // wait for a free handshake slot
16
        fd, err := srv.listener.Accept()
17
        // ...
18
        go func() {
19
            defer func() { slots <- struct{}{} }()
20
            srv.SetupConn(fd, inboundConn, nil)
21
        }()
22
    }
23
}

Each accepted connection consumes a handshake slot. The slot is returned after the handshake completes (or fails), ensuring at most MaxPendingPeers concurrent handshakes. Inbound connections are also rate-limited by source IP — the inboundHistory expiration heap ensures at most one connection per IP per 30 seconds, defending against connection-flood attacks.

Putting It All Together#

Here is the complete lifecycle of a peer connection:

Discovery fills the routing table with candidate nodes via UDP PING/PONG and FINDNODE/NEIGHBORS exchanges.
The dial scheduler picks candidates from the FairMix iterator and dials their TCP port. Static nodes are always dialed; dynamic nodes are dialed to fill remaining slots.
RLPx encryption handshake establishes shared AES and MAC keys via ECIES. Both sides now know each other’s static public key.
The server’s main loop validates the connection at the post-handshake checkpoint: checks MaxPeers, duplicate connections, and self-connections.
devp2p protocol handshake exchanges capabilities. The server’s main loop validates at the add-peer checkpoint: ensures at least one matching sub-protocol.
launchPeer() creates a Peer, matches protocols, and starts goroutines: one for reading, one for pings, and one per matched sub-protocol.
Message flow: the readLoop decrypts and decompresses messages, then routes them — base protocol messages (ping/pong/disconnect) are handled directly, sub-protocol messages are dispatched to the appropriate protoRW.in channel.
Disconnection can be triggered by: a network error, the remote peer sending a discMsg, a protocol handler returning an error, or a local shutdown. The Peer.run() loop closes the connection, waits for all goroutines, and sends a peerDrop to the server’s main loop, which removes the peer and notifies the dial scheduler.

Welcome