Geth(4) Account and State - Kehao Zheng's Website

Every Ethereum account — externally-owned accounts and contracts alike — lives in the world state. This chapter explains how geth represents accounts, organizes them in tries, caches reads and writes through StateDB, tracks changes with an undo log, and ultimately commits everything back to disk.

How State Flows Through a Block#

When geth processes a block, state changes follow this pipeline:

1
 1. StateDB is created from the parent block's state root
2
       │
3
       ▼
4
 2. For each transaction:
5
    a. EVM reads state    →  StateDB.GetBalance(), GetState(), ...
6
    b. EVM writes state   →  StateDB.AddBalance(), SetState(), ...
7
    c. Changes land in dirty maps inside stateObject (not in the trie yet)
8
    d. If the tx reverts   →  journal replays undo entries
9
       │
10
       ▼
11
 3. After each transaction:
12
    StateDB.Finalise()  →  moves dirty storage to pending,
13
                            deletes empty/self-destructed accounts
14
       │
15
       ▼
16
 4. IntermediateRoot()  →  flushes pending storage into per-account tries,
17
                            updates account trie, returns new state root
18
       │
19
       ▼
20
 5. StateDB.Commit()   →  commits all tries, writes trie nodes + code +
21
                            snapshot updates to the database

The key insight: geth defers trie writes as long as possible. During execution, all mutations live in fast Go maps. Only at commit time do they flow into the Merkle Patricia Tries covered in Chapter 03.

The Two-Trie Model#

Ethereum’s state is organized as a trie-of-tries:

1
                    Account Trie
2
                  (world state root)
3
                   /     |     \
4
                  /      |      \
5
          Account A  Account B  Account C
6
          (EOA)      (contract)  (contract)
7
                        |           |
8
                   Storage Trie  Storage Trie
9
                    /    |         /    \
10
                 slot0 slot1    slot0  slot1

The account trie maps keccak256(address) → RLP-encoded account data. Its root hash is the Root field in every block header (see Chapter 02).
Each contract account has its own storage trie that maps keccak256(storageSlot) → RLP-encoded value. The storage trie’s root hash is stored inside the account data.

Both tries are StateTrie instances (key-hashing tries covered in Chapter 03).

The StateAccount Struct#

Each account is represented on disk by a four-field struct in core/types/state_account.go:

1
type StateAccount struct {
2
    Nonce    uint64
3
    Balance  *uint256.Int
4
    Root     common.Hash // merkle root of the storage trie
5
    CodeHash []byte
6
}

Nonce — the transaction count (EOAs) or contract creation count (contracts). Incremented with each transaction sent from the account.
Balance — the account’s ETH balance in wei. Uses uint256.Int (256-bit integer) rather than big.Int for performance.
Root — the root hash of this account’s storage trie. For accounts with no storage, this is types.EmptyRootHash.
CodeHash — the Keccak256 hash of the account’s contract bytecode. For EOAs (non-contract accounts), this is types.EmptyCodeHash.

The NewEmptyStateAccount() constructor shows the defaults:

1
func NewEmptyStateAccount() *StateAccount {
2
    return &StateAccount{
3
        Balance:  new(uint256.Int),
4
        Root:     EmptyRootHash,
5
        CodeHash: EmptyCodeHash.Bytes(),
6
    }
7
}

A new account starts with zero balance, the empty root hash, and the empty code hash. An account is considered “empty” (eligible for deletion under EIP-161) when all three of nonce, balance, and code hash equal their zero/empty values.

For storage in the snapshot layer, geth uses a slim RLP encoding (SlimAccount) that replaces the empty root and empty code hash with nil bytes, saving space for the common case of simple EOAs.

stateObject: The In-Memory Account#

While StateAccount is the on-disk format, stateObject (in core/state/state_object.go) is the in-memory working copy. It wraps a StateAccount with caches, dirty tracking, and a reference back to its parent StateDB:

1
type stateObject struct {
2
    db       *StateDB
3
    address  common.Address
4
    addrHash common.Hash         // keccak256(address)
5
    origin   *types.StateAccount // original data before any changes, nil if new
6
    data     types.StateAccount  // current data with all mutations applied
7

8
    trie Trie   // storage trie, opened lazily on first access
9
    code []byte // contract bytecode, loaded on demand
10

11
    originStorage      Storage // storage values read from disk/trie
12
    dirtyStorage       Storage // storage modified in the current transaction
13
    pendingStorage     Storage // storage modified in the current block (across txs)
14
    uncommittedStorage Storage // storage modified since last commit, with original values
15

16
    dirtyCode      bool // true if code was updated
17
    selfDestructed bool // true if account was self-destructed
18
    newContract    bool // true if created in current tx (EIP-6780)
19
}

The Storage type is simply map[common.Hash]common.Hash.

The four storage maps form a layered cache:

Map	Scope	Purpose
`originStorage`	block	Values as read from disk. The “clean” baseline for comparison.
`dirtyStorage`	transaction	Values modified in the current transaction. Cleared after each `Finalise()`.
`pendingStorage`	block	Accumulated modifications across all transactions in the block. Used for trie updates.
`uncommittedStorage`	since last commit	Tracks which slots changed since the last trie commit, along with their original values.

This layering lets geth handle mid-transaction reverts (clear dirtyStorage entries) without losing cross-transaction state (pendingStorage).

StateDB: The Main API#

StateDB (in core/state/statedb.go) is the central interface that the EVM and all state-touching code use. It manages a collection of stateObjects and provides the public API for reading and writing accounts:

1
type StateDB struct {
2
    db         Database
3
    prefetcher *triePrefetcher
4
    reader     Reader
5
    trie       Trie              // the account trie, resolved on first access
6

7
    originalRoot common.Hash     // state root before any changes
8

9
    stateObjects         map[common.Address]*stateObject // live account objects
10
    stateObjectsDestruct map[common.Address]*stateObject // accounts deleted in this block
11
    mutations            map[common.Address]*mutation     // pending mutations per account
12

13
    dbErr  error   // first database error encountered
14
    refund uint64  // gas refund counter
15

16
    // Per-transaction state
17
    thash   common.Hash
18
    txIndex int
19
    logs    map[common.Hash][]*types.Log
20
    logSize uint
21

22
    // Per-block state
23
    preimages        map[common.Hash][]byte
24
    accessList       *accessList
25
    transientStorage transientStorage
26

27
    // Undo log
28
    journal *journal
29
    // ...
30
}

Key fields:

db — the Database interface that provides access to tries and the snapshot layer. It bridges StateDB to the trie database from Chapter 03 and the storage stack from Chapter 05.
reader — a Reader interface with Account(addr) and Storage(addr, slot) methods for loading state from disk.
stateObjects — the live cache of all accounts accessed during this block. Once an account is loaded, it stays here for the duration of the block.
stateObjectsDestruct — accounts that were self-destructed or deleted (EIP-161 empty accounts). Stored separately so that storage lookups for destructed accounts return empty.
mutations — tracks which accounts have been modified and whether they were updated or deleted. Used during commit to know what to flush.
journal — the undo log that enables Snapshot() / RevertToSnapshot().

Creating a StateDB#

1
func New(root common.Hash, db Database) (*StateDB, error) {
2
    reader, err := db.Reader(root)
3
    if err != nil {
4
        return nil, err
5
    }
6
    return NewWithReader(root, db, reader)
7
}

New takes a state root (typically the parent block’s state root) and a Database. It creates a Reader bound to that root, which will be used for all subsequent state lookups. The account trie itself is not opened yet — it’s resolved lazily on first write.

Reading State#

All read operations follow the same pattern: look up the stateObject for the address, then read the field. For example:

1
func (s *StateDB) GetBalance(addr common.Address) *uint256.Int {
2
    stateObject := s.getStateObject(addr)
3
    if stateObject != nil {
4
        return stateObject.Balance()
5
    }
6
    return common.U2560
7
}

The interesting work is in getStateObject, which implements a multi-layer lookup:

1
func (s *StateDB) getStateObject(addr common.Address) *stateObject {
2
    // 1. Check the in-memory cache first
3
    if obj := s.stateObjects[addr]; obj != nil {
4
        return obj
5
    }
6
    // 2. If destructed in this block, return nil
7
    if _, ok := s.stateObjectsDestruct[addr]; ok {
8
        return nil
9
    }
10
    // 3. Load from the reader (snapshot or trie)
11
    acct, err := s.reader.Account(addr)
12
    if err != nil {
13
        s.setError(...)
14
        return nil
15
    }
16
    if acct == nil {
17
        return nil
18
    }
19
    // 4. Wrap in stateObject and cache
20
    obj := newObject(s, addr, acct)
21
    s.setStateObject(obj)
22
    return obj
23
}

The lookup chain:

In-memory cache (stateObjects map) — O(1) hash map lookup. Once loaded, accounts stay cached for the entire block.
Destruction check — if the account was deleted in this block, return nil immediately. This prevents reading stale disk data for a destroyed account.
Reader — calls s.reader.Account(addr), which reads from the snapshot layer (if available) or falls back to the trie. The Reader interface abstracts this.
Cache and return — the loaded account is wrapped in a stateObject and inserted into the cache.

Reading Storage#

Storage reads follow a similar layered pattern inside stateObject:

1
func (s *stateObject) GetCommittedState(key common.Hash) common.Hash {
2
    if value, pending := s.pendingStorage[key]; pending {
3
        return value
4
    }
5
    if value, cached := s.originStorage[key]; cached {
6
        return value
7
    }
8
    if _, destructed := s.db.stateObjectsDestruct[s.address]; destructed {
9
        s.originStorage[key] = common.Hash{}
10
        return common.Hash{}
11
    }
12
    // Load from reader (snapshot/trie)
13
    value, err := s.db.reader.Storage(s.address, key)
14
    // ...
15
    s.originStorage[key] = value
16
    return value
17
}
18

19
func (s *stateObject) getState(key common.Hash) (common.Hash, common.Hash) {
20
    origin := s.GetCommittedState(key)
21
    value, dirty := s.dirtyStorage[key]
22
    if dirty {
23
        return value, origin
24
    }
25
    return origin, origin
26
}

getState returns two values: the current value and the committed (pre-transaction) value. Both are always needed — the committed value is used for gas metering (EIP-2200). So GetCommittedState always runs, even when a dirty value exists.

The committed value is resolved through these layers:

pendingStorage — values written in earlier transactions within this block.
originStorage — values previously loaded from disk (a read cache).
reader.Storage() — loads from the snapshot layer or trie on disk.

Then getState checks dirtyStorage — if a value was written in the current transaction, it overrides the committed value as the “current” return. Otherwise, the committed value is returned for both.

Values loaded from disk are cached in originStorage for future reads.

Writing State#

Write operations also go through StateDB, which delegates to stateObject:

1
func (s *StateDB) AddBalance(addr common.Address, amount *uint256.Int, reason tracing.BalanceChangeReason) uint256.Int {
2
    stateObject := s.getOrNewStateObject(addr)
3
    if stateObject == nil {
4
        return uint256.Int{}
5
    }
6
    return stateObject.AddBalance(amount)
7
}
8

9
func (s *StateDB) SetState(addr common.Address, key, value common.Hash) common.Hash {
10
    if stateObject := s.getOrNewStateObject(addr); stateObject != nil {
11
        return stateObject.SetState(key, value)
12
    }
13
    return common.Hash{}
14
}

getOrNewStateObject loads an existing account or creates a new empty one. The actual mutation happens inside stateObject:

1
func (s *stateObject) SetBalance(amount *uint256.Int) uint256.Int {
2
    prev := *s.data.Balance
3
    s.db.journal.balanceChange(s.address, s.data.Balance)
4
    s.setBalance(amount)
5
    return prev
6
}
7

8
func (s *stateObject) SetState(key, value common.Hash) common.Hash {
9
    prev, origin := s.getState(key)
10
    if prev == value {
11
        return prev
12
    }
13
    s.db.journal.storageChange(s.address, key, prev, origin)
14
    s.setState(key, value, origin)
15
    return prev
16
}

Every write follows the same two-step pattern:

Journal the change — record the previous value in the journal so it can be undone on revert.
Apply the mutation — update the in-memory field (data.Balance) or dirty map (dirtyStorage).

The setState helper has a subtle optimization: if the new value equals the original (pre-transaction) value, the key is removed from dirtyStorage entirely. This means “set back to original” is a no-op from the trie’s perspective.

1
func (s *stateObject) setState(key common.Hash, value common.Hash, origin common.Hash) {
2
    if value == origin {
3
        delete(s.dirtyStorage, key)
4
        return
5
    }
6
    s.dirtyStorage[key] = value
7
}

The Journal: Snapshot and Revert#

The EVM needs to undo state changes when a transaction reverts (out of gas, REVERT opcode, etc.). Geth handles this with a journal — an append-only log of undo entries.

The journal is defined in core/state/journal.go:

1
type journalEntry interface {
2
    revert(*StateDB)
3
    dirtied() *common.Address
4
    copy() journalEntry
5
}
6

7
type journal struct {
8
    entries []journalEntry         // list of undo entries
9
    dirties map[common.Address]int // dirty accounts and their change count
10

11
    validRevisions []revision
12
    nextRevisionId int
13
}
14

15
type revision struct {
16
    id           int
17
    journalIndex int
18
}

Each state mutation (balance change, storage write, nonce update, etc.) appends a journalEntry that knows how to undo itself. The concrete entry types are defined in the same file:

1
type balanceChange struct {
2
    account common.Address
3
    prev    *uint256.Int
4
}
5

6
type storageChange struct {
7
    account   common.Address
8
    key       common.Hash
9
    prevvalue common.Hash
10
    origvalue common.Hash
11
}
12

13
type nonceChange struct {
14
    account common.Address
15
    prev    uint64
16
}
17
// ... plus codeChange, createObjectChange, selfDestructChange, etc.

Each entry stores just enough data to undo the change — typically the previous value.

Snapshot and RevertToSnapshot#

The EVM takes a snapshot before each internal call. If the call fails, it reverts to the snapshot:

1
func (s *StateDB) Snapshot() int {
2
    return s.journal.snapshot()
3
}
4

5
func (s *StateDB) RevertToSnapshot(revid int) {
6
    s.journal.revertToSnapshot(revid, s)
7
}

snapshot() records the current journal length and returns an ID. revertToSnapshot() replays all journal entries from the current end back to the recorded length, calling revert() on each:

1
func (j *journal) revert(statedb *StateDB, snapshot int) {
2
    for i := len(j.entries) - 1; i >= snapshot; i-- {
3
        j.entries[i].revert(statedb)
4

5
        if addr := j.entries[i].dirtied(); addr != nil {
6
            if j.dirties[*addr]--; j.dirties[*addr] == 0 {
7
                delete(j.dirties, *addr)
8
            }
9
        }
10
    }
11
    j.entries = j.entries[:snapshot]
12
}

The reversal walks backward through the journal entries, undoing each change. The dirties map is also adjusted — if an account’s change count drops to zero, it’s removed from the dirty set entirely.

Finalise and IntermediateRoot#

After each transaction, Finalise() promotes dirty storage to pending and cleans up:

1
func (s *StateDB) Finalise(deleteEmptyObjects bool) {
2
    for addr := range s.journal.dirties {
3
        obj, exist := s.stateObjects[addr]
4
        if !exist {
5
            continue
6
        }
7
        if obj.selfDestructed || (deleteEmptyObjects && obj.empty()) {
8
            delete(s.stateObjects, obj.address)
9
            s.markDelete(addr)
10
            if _, ok := s.stateObjectsDestruct[obj.address]; !ok {
11
                s.stateObjectsDestruct[obj.address] = obj
12
            }
13
        } else {
14
            obj.finalise()
15
            s.markUpdate(addr)
16
        }
17
        // ...
18
    }
19
    // ...
20
}

For each account that was dirtied during the transaction:

If self-destructed or empty (EIP-161): delete it from the live set and record it in stateObjectsDestruct.
Otherwise: call obj.finalise(), which moves dirtyStorage entries into pendingStorage and clears the dirty map.

IntermediateRoot() goes one step further — it flushes pending storage into the actual tries and updates the account trie:

1
// core/state/statedb.go (simplified)
2

3
func (s *StateDB) IntermediateRoot(deleteEmptyObjects bool) common.Hash {
4
    s.Finalise(deleteEmptyObjects)
5

6
    // Open the account trie if not yet loaded
7
    if s.trie == nil {
8
        tr, err := s.db.OpenTrie(s.originalRoot)
9
        // ...
10
        s.trie = tr
11
    }
12
    // Phase 1: Update each account's storage trie (concurrently)
13
    for addr, op := range s.mutations {
14
        if op.applied || op.isDelete() {
15
            continue
16
        }
17
        obj := s.stateObjects[addr]
18
        workers.Go(func() error {
19
            obj.updateRoot()
20
            return nil
21
        })
22
    }
23
    workers.Wait()
24

25
    // Phase 2: Write account data into the account trie
26
    for addr, op := range s.mutations {
27
        // ...
28
        if op.isDelete() {
29
            deletedAddrs = append(deletedAddrs, addr)
30
        } else {
31
            s.updateStateObject(s.stateObjects[addr])
32
        }
33
    }
34
    for _, deletedAddr := range deletedAddrs {
35
        s.deleteStateObject(deletedAddr)
36
    }
37
    return s.trie.Hash()
38
}

The function has two phases:

Storage tries — updateRoot() on each stateObject flushes uncommittedStorage into the storage trie via Trie.UpdateStorage() / Trie.DeleteStorage(), then calls trie.Hash() to compute the new storage root. This runs concurrently for all mutated accounts.
Account trie — after all storage roots are updated, each mutated account’s data (including the new Root) is written into the account trie via updateStateObject(). Deleted accounts are removed via deleteStateObject(). Updates are applied before deletions to avoid unnecessary trie node resolution.

The result is a new state root hash without committing to disk — this is used to set the block header’s Root field during block processing.

Commit: Flushing to Disk#

Commit() writes all state changes to the database. It’s called once per block after all transactions are processed:

1
func (s *StateDB) Commit(block uint64, deleteEmptyObjects bool, noStorageWiping bool) (common.Hash, error) {
2
    ret, err := s.commitAndFlush(block, deleteEmptyObjects, noStorageWiping)
3
    // ...
4
    return ret.root, nil
5
}

The inner commit() method orchestrates the work:

IntermediateRoot() — finalise all pending changes and flush into tries.
Handle destructions — process account deletions first (storage trie cleanup).
Commit account trie — s.trie.Commit(true) is scheduled first since the account trie is the largest.
Commit storage tries — for each mutated account, obj.commit() commits the storage trie and returns a NodeSet of dirty trie nodes. These run concurrently with each other and with the account trie commit via errgroup.
Merge all NodeSets — all dirty trie nodes (account + storage) are merged into a single MergedNodeSet.

The commitAndFlush() wrapper then persists the results:

1
// core/state/statedb.go (commitAndFlush, simplified)
2

3
func (s *StateDB) commitAndFlush(block uint64, ...) (*stateUpdate, error) {
4
    ret, err := s.commit(deleteEmptyObjects, noStorageWiping, block)
5
    // ...
6
    // Write contract code to disk
7
    if len(ret.codes) > 0 {
8
        batch := db.NewBatch()
9
        for _, code := range ret.codes {
10
            rawdb.WriteCode(batch, code.hash, code.blob)
11
        }
12
        batch.Write()
13
    }
14
    // Update snapshot tree
15
    if snap := s.db.Snapshot(); snap != nil {
16
        snap.Update(ret.root, ret.originRoot, ret.accounts, ret.storages)
17
        snap.Cap(ret.root, TriesInMemory)
18
    }
19
    // Write trie nodes to the trie database
20
    if db := s.db.TrieDB(); db != nil {
21
        db.Update(ret.root, ret.originRoot, block, ret.nodes, ret.stateSet())
22
    }
23
    return ret, nil
24
}

Three things are written:

Contract code — new or modified bytecode is written via rawdb.WriteCode() in a batch.
Snapshot layer — the flat state snapshot is updated with the account and storage diffs. Cap() keeps at most TriesInMemory (128) diff layers in memory.
Trie database — all dirty trie nodes are submitted to triedb.Database.Update(), which either adds them to the hashdb cache or creates a new pathdb diff layer (see Chapter 03).

After commit, the StateDB is effectively consumed — its tries are committed and no longer usable. A new StateDB must be created from the new root for subsequent blocks.

The Read Path: Reader Interface#

The Reader interface abstracts how state is loaded from disk. It combines two sub-interfaces:

1
type StateReader interface {
2
    Account(addr common.Address) (*types.StateAccount, error)
3
    Storage(addr common.Address, slot common.Hash) (common.Hash, error)
4
}
5

6
type ContractCodeReader interface {
7
    Code(addr common.Address, codeHash common.Hash) ([]byte, error)
8
    CodeSize(addr common.Address, codeHash common.Hash) (int, error)
9
}
10

11
type Reader interface {
12
    ContractCodeReader
13
    StateReader
14
}

The Reader is created by db.Reader(root) when StateDB is constructed. Depending on the configuration, the implementation may read from the snapshot layer first (O(1) flat lookups) and fall back to the trie (O(log n) path traversal) only for misses. This is why snapshot lookups appear in the read path before trie lookups.

What’s Next#

With accounts and state covered, we now understand how geth manages the in-memory representation of Ethereum’s world state. Chapter 05 — The Storage Stack completes the picture by tracing the full path from StateDB through the trie database down to bytes on disk.

Welcome