Geth(5) QA - Kehao Zheng's Website

Kehao Zheng

Click the avatar to see more info

1198 words

6 minutes

Geth(5) QA

2026-04-14

Inside Ethereum

BlockChain

/

Ethereum

/

Geth

Q1: What is Chapter 5 about? How does it fit into the bigger picture?#

Previous chapters covered how state is organized (StateDB, stateObject) and how it’s authenticated (Merkle Patricia Trie). But neither answered: where do all these bytes actually end up on disk?

Chapter 5 traces the complete path from in-memory state changes to on-disk bytes. It covers the bottom two layers of a four-layer architecture:

1
Layer 4: StateDB              ← Chapter 4: in-memory state read/write
2
    │
3
    ▼
4
Layer 3: Trie + TrieDB        ← Chapter 3: MPT node management
5
    │
6
    ▼
7
Layer 2: rawdb accessor layer  ← This chapter: key construction, RLP encoding
8
    │
9
    ▼
10
Layer 1: Pebble + Freezer     ← This chapter: actual disk storage

Layers 3–4 were covered previously. Chapter 5 focuses on Layer 2 (rawdb) — how keys are designed and data is encoded, and Layer 1 (Pebble + Freezer) — where data physically lives on disk.

Q2: How does geth store many different data types in a single database?#

Geth stores everything — block headers, bodies, receipts, trie nodes, contract code, snapshots — in one flat key-value database. Different data types are distinguished by single-byte key prefixes defined in core/rawdb/schema.go:

Prefix	Key format	Value
`"h"`	`h` + blockNum(8) + hash(32)	Block header (RLP)
`"b"`	`b` + blockNum(8) + hash(32)	Block body (RLP)
`"r"`	`r` + blockNum(8) + hash(32)	Receipts (RLP)
`"c"`	`c` + codeHash(32)	Contract bytecode
`"a"`	`a` + accountHash(32)	Snapshot: account data
`"A"`	`A` + hexPath	Trie node (account trie, path-based)
`"O"`	`O` + accountHash(32) + hexPath	Trie node (storage trie, path-based)
`"l"`	`l` + txHash(32)	Transaction lookup metadata

Key construction functions are also in schema.go:

1
func headerKey(number uint64, hash common.Hash) []byte {
2
    return append(append(headerPrefix, encodeBlockNumber(number)...), hash.Bytes()...)
3
}

Block numbers are encoded as 8-byte big-endian integers, so keys within each prefix are naturally sorted by block number — this makes range scans efficient.

The single-byte prefix guarantees no key collisions between different data types.

Q3: What is the rawdb accessor layer and why does it exist?#

core/rawdb/ provides typed accessor functions that wrap key construction + RLP encoding/decoding. Upper layers never construct raw keys or call db.Get() directly:

1
// Writing: upper layer just passes the struct
2
rawdb.WriteHeader(db, header)
3
// Internally: RLP encode → build key "h" + num + hash → db.Put(key, data)
4

5
// Reading: upper layer just passes hash and block number
6
header := rawdb.ReadHeader(db, hash, number)
7
// Internally: build key → db.Get(key) → RLP decode → return *types.Header

Reading has an extra twist — it checks two locations:

1
func ReadHeaderRLP(db ethdb.Reader, hash common.Hash, number uint64) rlp.RawValue {
2
    // 1. Try Freezer first (old canonical blocks, O(1) lookup)
3
    data, _ = reader.Ancient("headers", number)
4
    if len(data) > 0 && crypto.Keccak256Hash(data) == hash {
5
        return data
6
    }
7
    // 2. Fall back to key-value store (recent blocks or non-canonical forks)
8
    data, _ = db.Get(headerKey(number, hash))
9
    return data
10
}

The rawdb layer serves as a type-safe abstraction over the raw key-value store — upper layers think in terms of headers, bodies, and receipts, never raw bytes and key prefixes.

Q4: How does the interface design make the storage engine swappable?#

Geth defines all storage contracts as interfaces in ethdb/database.go:

1
type KeyValueReader interface {
2
    Has(key []byte) (bool, error)
3
    Get(key []byte) ([]byte, error)
4
}
5

6
type KeyValueWriter interface {
7
    Put(key []byte, value []byte) error
8
    Delete(key []byte) error
9
}
10

11
type KeyValueStore interface {
12
    KeyValueReader
13
    KeyValueWriter
14
    Batcher       // NewBatch() for atomic multi-key writes
15
    Iteratee      // NewIterator() for ordered key scans
16
    Compacter     // Compact() for LSM-tree compaction
17
    // ...
18
}

At the top, a single interface combines the key-value store with the ancient (Freezer) store:

1
type Database interface {
2
    KeyValueStore
3
    AncientStore
4
}

Pebble, LevelDB, and MemoryDB all implement KeyValueStore. Upper layers only depend on the interface, so switching engines is transparent:

Production → Pebble (default)
Testing → MemoryDB (in-memory map)
Legacy nodes → LevelDB

Q5: How does Pebble work and what are its key design choices in geth?#

Pebble is an LSM-tree key-value engine from CockroachDB, and geth’s default storage backend (replacing LevelDB). Three design choices stand out:

Async writes#

1
writeOptions: pebble.NoSync,

Put() and Batch.Write() return after writing to the in-memory WAL, without waiting for fsync. This is fast but means a power failure could lose the most recent writes. Geth tolerates this — it can recover from unclean shutdowns. Periodic background fsyncs via WALBytesPerSync limit the risk window.

Batch writes#

Individual Put() calls are expensive — each one goes through the WAL separately. When geth needs to write many keys atomically (e.g., a block’s trie nodes), it uses Batch:

1
batch := db.NewBatch()
2
batch.Put(key1, value1)    // buffered in memory
3
batch.Put(key2, value2)    // buffered in memory
4
batch.Write()              // atomic: all writes succeed or all fail

Batch buffers Put/Delete operations in memory. Nothing touches the database until Write(), which applies the entire batch atomically. An IdealBatchSize constant (100 KB) guides callers on when to flush.

Bloom filters#

Every SST file level has a bloom filter (10 bits per key). When looking up a key, the bloom filter answers “is this key possibly in this file?” — if “no,” the file is skipped entirely, avoiding wasted disk reads. This is especially valuable for Has() checks on non-existent keys.

Q6: What is the Freezer and why does geth need it?#

Pebble (like all LSM-tree engines) continuously does background compaction — merging and rewriting SST files across levels. For active data this is necessary, but for historical blocks (block #1 through millions of finalized blocks) that will never change again, repeatedly compacting them is pure waste.

The Freezer solves this by moving finalized blocks out of Pebble into simple append-only flat files:

1
Pebble (key-value store)           Freezer (flat files)
2
┌──────────────────┐               ┌──────────────────┐
3
│ Recent blocks    │   ── move →   │ headers file     │
4
│ Trie nodes       │               │ bodies file      │
5
│ Snapshots        │               │ receipts file    │
6
│ Contract code    │               │ hashes file      │
7
└──────────────────┘               └──────────────────┘
8
  Random read/write,                 Append-only, O(1) read,
9
  compaction overhead                no compaction needed

Storage format#

Each data type is a table with two files:

Index file — fixed-size 6-byte entries (2-byte file number + 4-byte offset). To find entry N, read 6 bytes at position N×6.
Data file — actual data blobs, optionally Snappy-compressed. Capped at 2 GB per file.

Reading is O(1): seek to index entry → read file number and offset → seek into data file → read blob. No tree traversal.

Background migration#

A background goroutine in chainFreezer periodically runs:

1
1. Compute freeze threshold (finalized block number)
2
2. Copy header/body/receipt/hash for blocks below threshold into Freezer files
3
3. fsync Freezer files (ensure data is on disk)
4
4. Batch-delete the frozen blocks from Pebble

Copy-then-delete with fsync in between ensures no data loss even if the process crashes mid-migration.

Read order#

When reading, Freezer is checked first (old canonical blocks are likely there with O(1) access), then Pebble is checked as fallback (for recent blocks or non-canonical fork blocks).

Q7: What is the complete write path for a new block, from execution to disk?#

1
StateDB.Commit()
2
  ├─ Contract code      → rawdb.WriteCode()     → Pebble: "c" + codeHash
3
  ├─ Trie nodes         → triedb batch write     → Pebble: "A" + path
4
  └─ Snapshot           →                        → Pebble: "a" + accountHash
5

6
blockchain.writeBlock()
7
  ├─ Header             → rawdb.WriteHeader()    → Pebble: "h" + num + hash
8
  ├─ Body               → rawdb.WriteBody()      → Pebble: "b" + num + hash
9
  ├─ Receipts           → rawdb.WriteReceipts()  → Pebble: "r" + num + hash
10
  └─ Tx index           → rawdb.WriteTxLookup()  → Pebble: "l" + txHash
11

12
        ↓ (later, background goroutine)
13

14
chainFreezer.freeze()
15
  ├─ Copy header/body/receipt/hash to Freezer flat files
16
  ├─ fsync
17
  └─ Batch-delete frozen data from Pebble

All data initially goes to Pebble (fast async writes). Old finalized data is later migrated to the Freezer in the background (saves space, removes compaction overhead). Reads check Freezer first, then fall back to Pebble.

Geth(5) QA

https://kehaozheng.vercel.app/posts/chainethgeth/05_qa/

Author

Kehao Zheng

Published at