Geth(3) QA - Kehao Zheng's Website

Q1: How do the four node types in geth’s MPT implementation work?#

Geth uses four concrete types to represent all trie nodes (trie/node.go):

1
fullNode  struct { Children [17]node; flags nodeFlag }
2
shortNode struct { Key []byte; Val node; flags nodeFlag }
3
hashNode  []byte
4
valueNode []byte

fullNode — Branch node#

The MPT branch node. A nibble has 16 possible values (0x0–0xf), so there are 16 child slots. The 17th slot (Children[16]) holds a value when a key terminates exactly at this branch point. Most slots are nil (no keys with that nibble).

shortNode — Extension node OR Leaf node#

A single struct represents both extension and leaf nodes. The distinction is determined by the type of Val:

Val is fullNode/shortNode/hashNode → extension node (Key is a shared prefix)
Val is valueNode → leaf node (Key is the remaining key suffix)

Example: two keys sharing prefix 0xca:

1
shortNode{Key: [c,a], Val: fullNode{...}}    ← extension node, "ca" is common prefix
2
                            │
3
              ┌─────────────┼─────────────┐
4
          Children[f]                 Children[3]
5
              │                           │
6
  shortNode{Key:[e], Val:"alice"}  shortNode{Key:[b], Val:"bob"}
7
       ↑ leaf node                      ↑ leaf node

hashNode — Lazy-loading placeholder#

A pure engineering concept with no MPT theory equivalent. The entire state trie has hundreds of millions of nodes — they can’t all be loaded into memory. When geth reads a fullNode from disk, its Children contain 32-byte hash references instead of real nodes.

Only when a Get() path passes through a hashNode does resolveAndTrack fetch the node from disk, decode it, and replace the hashNode in-place. This is lazy loading.

valueNode — Leaf value#

Simply a []byte holding the actual data stored in the trie. In the account trie, this is the RLP-encoded account (nonce, balance, storageRoot, codeHash).

nodeFlag — Cached metadata#

1
type nodeFlag struct {
2
    hash  hashNode  // cached Keccak256 hash (reused until node is modified)
3
    dirty bool      // whether node has been modified since last commit
4
}

Every fullNode and shortNode carries a flags field. hash caches the computed hash to avoid recomputation. dirty tracks whether the node needs to be written to disk.

Summary mapping#

geth type	MPT concept	Data	Purpose
`fullNode`	Branch node	17 child slots	Path fork point, 16-way branching by nibble
`shortNode` + Val is node	Extension node	shared prefix + child	Compress consecutive single-branch paths
`shortNode` + Val is valueNode	Leaf node	remaining key + value	Store final data
`hashNode`	(no theory equivalent)	32-byte hash	Lazy-loading placeholder, loaded from disk on demand
`valueNode`	(value part of leaf)	raw bytes	The actual stored data

Q2: What does the Trie struct mean and how does it work?#

1
type Trie struct {
2
    root        node          // (1)
3
    owner       common.Hash   // (2)
4
    committed   bool          // (3)
5
    unhashed    int           // (4)
6
    uncommitted int           // (5)
7
    reader      *Reader       // (6)
8
}

Fields#

(1) root node — Entry point of the entire tree. All operations (Get, Update, Delete, Hash, Commit) start here. Can be fullNode, shortNode, hashNode, or nil (empty trie). Each Update replaces root with the new returned node.

(2) owner common.Hash — Identifies which trie this is. Ethereum has many tries:

Global account trie → owner = common.Hash{} (zero)
Each contract’s storage trie → owner = keccak256(contract address)

The owner is used by the pathdb backend to distinguish nodes: (owner, path) is the database key.

(3) committed bool — Set to true after Commit(). The Trie object cannot be used after commit because its internal node state is no longer complete. A new Trie must be created with the new root hash. This is a disposable design: create → read/write → Commit → discard → create new for next block.

(4) unhashed int — Performance counter. Incremented on every Update(). When >= 100, Hash() enables parallel hashing (multiple goroutines compute subtree hashes simultaneously). Small modifications don’t benefit from parallelism.

(5) uncommitted int — Same as unhashed but for Commit(). Enables parallel commit when > 100 modifications.

(6) reader *Reader — Disk reader. When recursion hits a hashNode, the reader fetches the real node from the database, decodes it from RLP, and returns it.

Lifecycle#

1
1. New(root_hash, db)       → reader connects to DB, root loaded from disk
2
2. Get/Update/Delete        → recurse from root, lazy-load hashNodes via reader
3
3. Hash()                   → bottom-up hash all dirty nodes, return state root
4
4. Commit()                 → collect dirty nodes into NodeSet, mark committed=true
5
5. Next block               → New() with the new root hash, start fresh

Q3: How do the trie operations (Get, Update, Delete) work? Show with a concrete example.#

Building a trie: insert three key-value pairs#

Keys converted to nibbles (hex encoding):

1
"cat" (0x636174) → [6, 3, 6, 1, 7, 4, 16]   (16 = terminator)
2
"car" (0x636172) → [6, 3, 6, 1, 7, 2, 16]
3
"dog" (0x646f67) → [6, 4, 6, f, 6, 7, 16]

Insert “cat” — empty trie, insert hits nil, creates a leaf:

1
root: shortNode{Key: [6,3,6,1,7,4,16], Val: "meow"}

Insert “car” — nibbles compared with “cat”:

1
cat: [6, 3, 6, 1, 7, 4, 16]
2
car: [6, 3, 6, 1, 7, 2, 16]
3
                    ^ diverge at position 5 (matchlen = 5)

Creates a fullNode at the fork point, wraps with shared prefix:

1
root: shortNode{Key: [6,3,6,1,7], Val: fullNode(B)}
2
                                         │
3
                          ┌──────────────┼──────────────┐
4
                      Children[2]                   Children[4]
5
                          │                             │
6
              shortNode{[16],"vroom"}        shortNode{[16],"meow"}

Insert “dog” — diverges from root’s key at position 1:

1
root.Key: [6, 3, 6, 1, 7]
2
dog:      [6, 4, 6, f, 6, 7, 16]
3
              ^ matchlen = 1

Final trie:

1
root: shortNode{Key: [6], Val: fullNode(A)}
2
                                    │
3
                     ┌──────────────┼──────────────┐
4
                 Children[3]                   Children[4]
5
                     │                             │
6
           shortNode{[6,1,7]}            shortNode{[6,f,6,7,16],"woof"}
7
               Val: fullNode(B)
8
                     │
9
       ┌─────────────┼─────────────┐
10
   Children[2]                 Children[4]
11
       │                           │
12
shortNode{[16],"vroom"}    shortNode{[16],"meow"}

Get “car”#

Nibbles: [6, 3, 6, 1, 7, 2, 16], starting at root, pos=0:

1
Step 1: shortNode{Key:[6]}  → key starts with [6]? yes → skip 1, pos=1
2
Step 2: fullNode(A)         → key[1]=3 → Children[3], pos=2
3
Step 3: shortNode{Key:[6,1,7]} → key starts with [6,1,7]? yes → skip 3, pos=5
4
Step 4: fullNode(B)         → key[5]=2 → Children[2], pos=6
5
Step 5: shortNode{Key:[16]} → key starts with [16]? yes → recurse to Val
6
Step 6: valueNode("vroom")  → return "vroom" ✓

If any step encounters a hashNode instead of a real node: call resolveAndTrack → fetch from disk → decode → replace hashNode → continue.

Delete “cat”#

Follows the same path to find fullNode(B).Children[4], sets it to nil. Now fullNode(B) has only one remaining child (Children[2]). The reduction logic triggers:

fullNode(B) collapses into shortNode{[2,16], "vroom"}
Parent shortNode{[6,1,7]} now points to a shortNode — two shortNodes chain together, so they merge: shortNode{[6,1,7,2,16], "vroom"}

The trie always stays minimal — no fullNode is left with only one child.

Q4: What is the dirty flag and how does it work?#

What it is#

Every fullNode and shortNode has flags.dirty — a boolean meaning “this node has been modified but not yet written to disk”.

When it’s set#

Every insert or delete that creates or modifies a node calls t.newFlag():

1
func (t *Trie) newFlag() nodeFlag {
2
    return nodeFlag{dirty: true}
3
}

Only nodes on the modified path become dirty. Untouched nodes remain clean with their cached hash intact.

How Hash() uses it#

1
func (h *hasher) hash(n node, force bool) []byte {
2
    if hash, _ := n.cache(); hash != nil {  // has cached hash and not dirty?
3
        return hash                          // skip! return cached hash
4
    }
5
    // ... otherwise recompute: RLP encode + Keccak256
6
}

dirty node (dirty=true, hash=nil): must recompute RLP encoding + Keccak256
clean node (dirty=false, hash=0x456…): return cached hash immediately

How Commit() uses it#

1
hash, dirty := n.cache()
2
if hash != nil && !dirty {
3
    return hash    // not dirty → already on disk, skip
4
}
5
// dirty → collect into NodeSet for writing to disk

Only dirty nodes are collected into the NodeSet and written to disk.

Why it matters#

A single block might modify a few hundred accounts, but the trie has hundreds of millions of nodes. The dirty flag ensures Hash() and Commit() only do incremental work on the modified path, not a full tree traversal.

1
Update/Delete → nodes on path get dirty=true, hash=nil
2
Hash()        → dirty nodes: recompute hash. clean nodes: return cached hash
3
Commit()      → dirty nodes: collect to NodeSet → write disk. clean nodes: skip

Q5: How is trie data stored? What is at each layer?#

The storage has three layers. Using a concrete example with two accounts (alice and bob):

Layer 1: Trie in memory — Go struct pointer chains#

1
trie.root = &shortNode{
2
    Key: [0x0a],
3
    Val: &fullNode{
4
        Children: [17]node{
5
            [0xa]: &shortNode{Key: [..., 16], Val: valueNode{0xf8, 0x44, ...}},  // alice
6
            [0xb]: &shortNode{Key: [..., 16], Val: valueNode{0xf8, 0x40, ...}},  // bob
7
        },
8
    },
9
}

These are ordinary Go objects living in the heap memory of the geth process. Shut down geth → gone.

Layer 2: triedb — in-memory byte cache#

After Trie.Commit(), dirty nodes are serialized to RLP bytes and handed to triedb. Still in memory.

hashdb backend — stores in a dirties map keyed by node hash:

1
dirties = map[hash]*cachedNode{
2
    0x7f3e...: {node: [0xc4, 0x82, ...]},   // root
3
    0x5d8a...: {node: [0xf8, 0x51, ...]},   // fullNode
4
    0x1234...: {node: [0xc8, 0x82, ...]},   // alice leaf
5
    0x5678...: {node: [0xc8, 0x82, ...]},   // bob leaf
6
}

pathdb backend — stores in diffLayer keyed by owner+path:

1
diffLayer{
2
    block: 100,
3
    nodes: {
4
        owner(0x00..00) + path([]):             [0xc4, 0x82, ...],  // root
5
        owner(0x00..00) + path([0x0a]):         [0xf8, 0x51, ...],  // fullNode
6
        owner(0x00..00) + path([0x0a,0x0a,...]): [0xc8, 0x82, ...], // alice
7
        owner(0x00..00) + path([0x0a,0x0b,...]): [0xc8, 0x82, ...], // bob
8
    }
9
}

Shut down geth → gone (unless journaled for crash recovery).

Layer 3: ethdb — actual disk persistence#

When triedb.Commit() is called, data is finally written to LevelDB/Pebble on disk.

hashdb writes — key is the node’s Keccak256 hash:

1
LevelDB key → value:
2
0x7f3e... (32-byte hash)  →  [0xc4, 0x82, ...]
3
0x5d8a... (32-byte hash)  →  [0xf8, 0x51, ...]
4
0x1234... (32-byte hash)  →  [0xc8, 0x82, ...]
5
0x5678... (32-byte hash)  →  [0xc8, 0x82, ...]

pathdb writes — key is owner + trie path:

1
LevelDB key → value:
2
0x00..00 + []               →  [0xc4, 0x82, ...]
3
0x00..00 + [0x0a]           →  [0xf8, 0x51, ...]
4
0x00..00 + [0x0a,0x0a,...]  →  [0xc8, 0x82, ...]
5
0x00..00 + [0x0a,0x0b,...]  →  [0xc8, 0x82, ...]

Shut down geth → still here. This is the only persistent layer.

Reading goes in reverse#

1
StateDB.GetBalance(alice)
2
  → trie.Get(keccak256(alice))
3
    → recurse from root, hit hashNode(0x5d8a...)
4
      → resolveAndTrack: ask triedb
5
        → hashdb: check dirties[0x5d8a...] in memory, else ethdb.Get(0x5d8a...)
6
        → pathdb: check diffLayer chain in memory, else diskLayer → ethdb
7
      → get RLP bytes → decodeNodeUnsafe → fullNode Go object
8
    → replace hashNode, continue recursing
9
  → reach valueNode → RLP decode → Account struct → return Balance

Summary diagram#

1
         Memory (Go objects)          Memory (bytes)              Disk (bytes)
2
       ┌──────────────────┐      ┌──────────────────┐      ┌──────────────────┐
3
       │     Trie         │      │     triedb        │      │     ethdb        │
4
       │                  │ ──→  │                   │ ──→  │                  │
5
       │  shortNode{      │      │  dirties map      │      │  LevelDB/Pebble  │
6
       │    fullNode{     │      │  or diffLayer     │      │                  │
7
       │      ...         │      │                   │      │  SST files       │
8
       │    }             │      │  still in memory!  │      │  WAL files       │
9
       │  }               │      │                   │      │                  │
10
       └──────────────────┘      └──────────────────┘      └──────────────────┘
11
       shutdown → gone           shutdown → gone            shutdown → survives

What else is on disk besides trie nodes#

The same chaindata LevelDB/Pebble database stores everything, distinguished by key prefixes:

1
chaindata/
2
├── Trie nodes          ← account trie + storage trie nodes
3
├── Block headers       ← key = "h" + blockNum + hash
4
├── Block bodies        ← key = "b" + blockNum + hash (transaction lists)
5
├── Receipts            ← key = "r" + blockNum + hash
6
├── Contract code       ← key = codeHash, value = bytecode
7
├── Preimages           ← key = hash, value = original address/key (optional)
8
│
9
└── ancient/ (freezer)
10
    ├── headers/        ← old block headers (append-only, compressed)
11
    ├── bodies/         ← old block bodies
12
    ├── receipts/       ← old receipts
13
    └── hashes/         ← old block hashes

Hot data (recent blocks) lives in the KV store for random access. Cold data (old blocks) is moved to the freezer (ancient database), stored sequentially by block number to save space.

Welcome