अभिव्यक्ति

Building Dynamo from scratch in Go: Part 1

The Big Picture

This series

Code for this series

This is Part 1 of a 6-part series where we walk through a from-scratch implementation of Amazon's Dynamo, the distributed key-value store described in the SOSP 2007 paper. By the end of the series, you'll understand every major subsystem well enough to build your own or extend this one. Each post in this series corresponds to a logical layer of the implementation - you can follow along by reading the packages in order.

Outline

Why Dynamo?

Amazon's Dynamo paper is one of the pioneering papers (came out in 2007) among the wave of such systems coming out from Big Tech. A number of tech teams went ahead and wrote their own distributed databases in the years that followed, and quite a few were open sourced. One of the defining traits of this paper is how it presents you with a distributed systems toolbox and uses them to implement a highly available key-value store. Amazon built Dynamo to serve their own needs, and Dynamo eventually was superseded by publicly-available DynamoDB. However, the lessons learned here are valuable for anyone working at a deeper level in the distributed systems arena or if you ever had the need to build a distributed database for your use case.

Essentially, Amazon needed a key-value store that was always writeable — one that would accept writes even if disks were failing, network routes were flapping, or entire data centers were unreachable. The paper makes the design philosophy explicit: a customer should be able to add items to their shopping cart even during infrastructure failures. Rejecting a write is never acceptable.

This is a deliberate choice to favor availability over consistency. In the vocabulary of the CAP theorem, Dynamo is an AP system. When a network partition happens, it chooses to remain available (accepting reads and writes) rather than consistent (guaranteeing all nodes agree on the same value). The trade-off is that two clients might read different versions of the same data for a brief window - and the system provides mechanisms for the application to detect and resolve these conflicts.

The paper describes this as an eventually consistent data store: all updates reach all replicas eventually, but at any given instant, different replicas may hold different versions.

The Design Principles

Four principles guide every decision in Dynamo's design:

Incremental scalability. You can add a single storage node at a time without downtime or system-wide reorganization. The new node takes on a proportional share of the load and data, and the rest of the cluster barely notices.

Symmetry. Every node has the same role and responsibilities. There is no master, no leader, no special node. This is crucial - a distinguished node is a single point of failure, and eliminating it makes the system fundamentally more resilient.

Decentralization. This follows from symmetry. Instead of a central coordinator deciding who owns what data or which nodes are alive, nodes use peer-to-peer techniques - gossip protocols, consistent hashing, decentralized failure detection - to collectively maintain a shared understanding of the cluster.

Heterogeneity. Not all nodes need to be identical. A beefy machine can take on more responsibility than a smaller one. The system accommodates this through virtual nodes, which we'll explore in Part 2.

The Techniques at a Glance

Dynamo is a synthesis of well-known distributed systems techniques, each solving a specific problem. Here's the map - each technique gets its own deep-dive in a later post, but it helps to see the full picture upfront.

Problem Technique Paper Section Our Series
Partitioning data across nodes Consistent hashing with virtual nodes §4.1–4.2 Part 2
Detecting conflicts between replicas Vector clocks §4.4 Part 3
Resolving conflicts Application-level reconciliation §4.4 Part 3
Tunable consistency for reads/writes Quorum (N, R, W) §4.5–4.6 Part 4
Handling temporary node failures Hinted handoff + sloppy quorum §4.6 Part 4
Cluster membership Gossip protocol §4.8 Part 5
Permanent failure detection Phi accrual failure detector §4.8 Part 5
Replica synchronization Merkle trees + anti-entropy §4.7 Part 5
Performance under load Admission control, coordinator selection §6 Part 6

The elegance of Dynamo is that each technique is relatively simple on its own, but they compose into a system with powerful properties. Our implementation follows this same philosophy - each subsystem is its own Go package with clean boundaries, and they're wired together at the Node level.

Starting Simple: A Single-Node Key-Value Store

Before we think about distributing data across a cluster, let's build the simplest possible foundation: a single node that can store and retrieve key-value pairs. This is the seed from which everything else will grow.

The Storage Interface

Every Dynamo node needs a local storage engine - a place to actually persist data on a single machine. Different use cases call for different engines (an in-memory store for testing, a disk-backed store for production), so we define an interface:

// pkg/storage/interface.go

type Storage interface {
    Get(key string) ([]byte, error)
    Put(key string, value []byte) error
    Delete(key string) error
    Close() error
}

This is intentionally minimal. A key maps to a byte slice. No transactions, no range queries, no schemas. The paper is explicit about this - Dynamo targets services that only need primary-key access and store relatively small objects (under 1 MB).

Note: In the final implementation, Get returns []VersionedValue and Put takes a VersionedValue instead of raw bytes - we'll evolve to that in Part 3 when we introduce vector clocks. For now, thinking in terms of raw bytes keeps things simple.

An In-Memory Implementation

The simplest storage engine is a map guarded by a read-write mutex:

// pkg/storage/memory.go

type MemoryStorage struct {
    data map[string][]byte
    mu   sync.RWMutex
}

func NewMemoryStorage() *MemoryStorage {
    return &MemoryStorage{
        data: make(map[string][]byte),
    }
}

func (m *MemoryStorage) Get(key string) ([]byte, error) {
    m.mu.RLock()
    defer m.mu.RUnlock()

    value, exists := m.data[key]
    if !exists {
        return nil, ErrKeyNotFound
    }
    return value, nil
}

func (m *MemoryStorage) Put(key string, value []byte) error {
    m.mu.Lock()
    defer m.mu.Unlock()
    m.data[key] = value
    return nil
}

func (m *MemoryStorage) Delete(key string) error {
    m.mu.Lock()
    defer m.mu.Unlock()
    delete(m.data, key)
    return nil
}

func (m *MemoryStorage) Close() error {
    return nil
}

Nothing surprising here - and that's the point. The storage engine is a solved problem at the single-node level. What makes Dynamo interesting is everything built on top of this: how data is partitioned, versioned, replicated, and repaired across many such engines. This simple map will carry us surprisingly far.

The actual implementation supports pluggable backends - you can swap in a MemoryStorage for testing, an LSS (Log-Structured Storage) engine for production, or adapters for BoltDB and BadgerDB. They all satisfy the same interface, so the rest of the system doesn't care which one is underneath.

func NewStorage(engineType string) (Storage, error) {
    switch engineType {
    case "memory":
        return NewMemoryStorage(), nil
    case "lss":
        return NewLSSEngine(config)
    case "boltdb":
        return NewBoltDBStorage(path)
    default:
        return NewMemoryStorage(), nil
    }
}

The Node

Now we can define our first version of the Node struct - the central type that represents a single Dynamo instance:

// pkg/dynamo/node.go

type Node struct {
    id      string
    address string
    config  *Config
    storage storage.Storage

    stopCh chan struct{}
    mu     sync.RWMutex
}

Four fields of substance, and two for lifecycle management. That's it. The id uniquely identifies this node in the cluster (e.g., "node1"). The address is its network endpoint (e.g., "localhost:8001"). The config holds tuning parameters. And storage is the local storage engine.

Over the course of this series, this struct will grow substantially. By the time we're done, it will also hold a consistent hash ring, a coordinator for distributed reads and writes, a membership manager, a failure detector, an anti-entropy engine, a hinted handoff system, an RPC layer, an admission controller, and a metrics collector. But right now, we start lean.

Configuration

The configuration starts equally simple:

// pkg/dynamo/config.go

type Config struct {
    StorageEngine string
    StoragePath   string
}

func DefaultConfig() *Config {
    return &Config{
        StorageEngine: "memory",
        StoragePath:   "./data",
    }
}

Two fields. By the end of the series, Config will have over 30 fields covering replication factors, quorum sizes, gossip intervals, timeout durations, circuit breaker thresholds, and more. Each new concept we add will bring its own configuration knobs. But starting with just a storage engine choice keeps the first version immediately understandable.

Node Lifecycle

Creating and running a node follows a straightforward pattern:

func NewNode(id, address string, config *Config) (*Node, error) {
    if config == nil {
        config = DefaultConfig()
    }

    node := &Node{
        id:      id,
        address: address,
        config:  config,
        stopCh:  make(chan struct{}),
    }

    // Initialize storage engine
    var err error
    node.storage, err = storage.NewStorage(config.StorageEngine)
    if err != nil {
        return nil, fmt.Errorf("failed to initialize storage: %w", err)
    }

    return node, nil
}

func (n *Node) Start() error {
    // Nothing to start yet - no background processes
    return nil
}

func (n *Node) Stop() error {
    close(n.stopCh)
    return n.storage.Close()
}

NewNode validates config and initializes the storage engine. Start is a no-op for now - there are no background processes yet. In later posts, Start will launch goroutines for gossip, failure detection, anti-entropy, hinted handoff, and tombstone compaction. Stop signals shutdown via the stopCh channel and closes the storage engine.

At this point, we can write a simple program that creates a node and uses it as a local key-value store:

func main() {
    config := DefaultConfig()

    node, err := NewNode("node1", "localhost:8001", config)
    if err != nil {
        log.Fatal(err)
    }
    defer node.Stop()
    node.Start()

    // Store a value
    node.storage.Put("user:123", []byte("Alice"))

    // Retrieve it
    value, _ := node.storage.Get("user:123")
    fmt.Printf("Value: %s\n", value)  // "Alice"
}

Not very impressive yet - it's just a wrapper around a map. But the skeleton is in place. The Node owns a Storage, has a lifecycle (Start/Stop), and is configured via Config. Everything we build from here snaps into this structure.

Package Layout

The codebase is organized into packages that mirror the paper's subsystems:

pkg/
├── dynamo/          # The Node itself - lifecycle, config, coordination
├── ring/            # Consistent hashing and virtual nodes (Part 2)
├── versioning/      # Vector clocks and conflict resolution (Part 3)
├── replication/     # Quorum coordination and hinted handoff (Part 4)
├── membership/      # Gossip protocol and failure detection (Part 5)
├── synchronization/ # Merkle trees and anti-entropy (Part 5)
├── storage/         # Storage engine interface and implementations
├── rpc/             # Inter-node communication (Part 6)
├── metrics/         # Observability and monitoring (Part 6)
└── types/           # Shared interfaces to avoid circular imports

tests/
├── unit/            # Per-component tests
├── integration/     # Multi-node cluster tests
└── performance/     # Benchmarks

examples/
├── simple/          # Basic single-node usage
├── shopping_cart/   # Conflict resolution demo
└── cluster/         # Multi-node cluster demo

Each package has a clear responsibility and communicates with others through interfaces defined in pkg/types. This keeps dependencies flowing in one direction and makes each subsystem testable in isolation. The types package exists specifically to break circular import cycles - for example, both the replication and storage packages need to reference VersionedValue, so it lives in versioning and is referenced via the shared types.Storage interface.

This structure also mirrors how we'll build the system across posts. Each post introduces one or two new packages and wires them into the Node. By Part 6, every package is in play and the full system is operational.

What's Next

We now have a single node that can store and retrieve data locally. It's not distributed, it doesn't replicate, and it has no awareness of other nodes. In Part 2, we'll change that by introducing consistent hashing - the mechanism that lets us partition data across multiple nodes, add or remove nodes without reshuffling everything, and determine which node is responsible for any given key. That's where Dynamo starts becoming a distributed system.