Benchmarks

This document defines the benchmark methodology for OpenNANDLab and serves as the living record of results across versions.


Methodology

Reproducibility contract

Every benchmark in this file must be reproducible with a single command:

opennandlab benchmark --workload <name> --config docs/resources/config/config.yaml

Results are deterministic for a given config + random seed (default seed = 42). Set --seed to change.

What we measure

Metric

Definition

Unit

WAF

NAND bytes written / host bytes written

× (dimensionless, ≥ 1)

Throughput

Host bytes written per second

MB/s

Avg latency

Mean write latency across all ops

µs

P99 latency

99th-percentile write latency

µs

P999 latency

99.9th-percentile write latency

µs

GC cycles

Number of block erases triggered by GC

count

ECC correction rate

Errors corrected / total reads

%

UBER

Uncorrectable errors / total reads

rate

WL stddev

Std deviation of per-block erase counts

count

Lifetime estimate

max_pe / current_erase_rate

days

Standard test configs

TLC-standard (docs/resources/config/tlc_standard.yaml):

nand:
  cell_type: TLC
  blocks_per_plane: 1024
  pages_per_block: 256
  page_size_bytes: 4096
  max_pe_cycles: 3000

ftl:
  gc_policy: greedy
  gc_trigger_free_pct: 0.10
  over_provisioning_pct: 0.07

ecc:
  algorithm: bch
  bch_m: 8
  bch_t: 4

MLC-enterprise (docs/resources/config/mlc_enterprise.yaml):

nand:
  cell_type: MLC
  max_pe_cycles: 10000
  rber_floor: 1.0e-9
  rber_ceil: 5.0e-4

ftl:
  gc_policy: cost_benefit
  over_provisioning_pct: 0.20

Workload Definitions

W1: Sequential write

  • 1 GiB total writes, 4 KiB pages, queue depth 1

  • Access pattern: LBA 0 → max, sequential

W2: Random write

  • 1 GiB total writes, 4 KiB pages, queue depth 32

  • Access pattern: uniform random LBA

W3: Mixed 70/30

  • 1 GiB total I/O, 70% reads / 30% writes, 4 KiB

  • Access pattern: 80/20 Zipf (hot/cold)

W4: Database OLTP (simulated)

  • 512 MiB, 8 KiB average I/O, 50% read / 50% write

  • Random access pattern

W5: Long-run aging

  • 50× device capacity writes (full endurance test)

  • Random write, records RBER and WAF evolution over time


Results — v2.0 (TLC-standard config, greedy GC, CPython 3.12, Apple M2)

WAF comparison: GC policies

Policy

W1 (Seq)

W2 (Rand)

W3 (Mixed)

Greedy

1.07×

3.21×

2.18×

Cost-benefit

1.05×

2.63×

1.94×

Δ

-1.9%

-18.1%

-11.0%

Cost-benefit GC reduces WAF by ~18% on random-write workloads. Trade-off: +12% GC selection overhead.

Latency (µs) — W2 random write, greedy GC

Percentile

Without GC spike

With GC spike

P50

12

12

P90

19

890

P99

45

2 100

P999

78

8 400

GC spikes dominate tail latency. Cost-benefit GC reduces P999 by ~30% by selecting blocks with fewer valid pages (less copying work).

Wear distribution — W2 random write, 10 000 host writes

Policy

Min PE

Max PE

Mean PE

Stddev

Dynamic WL

8

14

11.2

1.3

No WL

0

31

10.9

6.8

Dynamic wear leveling reduces PE stddev by 5× on this workload.

ECC — BCH vs. LDPC (hard-decision) at RBER = 1e-4

Algorithm

BLER

Correction latency

BCH (m=8, t=4)

2.1e-5

280 µs

LDPC (n=1024, hard)

8.3e-6

410 µs

LDPC (n=1024, soft)

1.2e-6

580 µs

LDPC with soft-decision provides ~17× lower BLER than BCH at equivalent RBER, at the cost of 2× latency.

RBER vs. P/E cycles (TLC, Weibull model)

P/E cycles

RBER

0

1.00e-8

500

4.12e-6

1 500

3.24e-4

3 000

9.87e-4

BCH (t=4) corrects up to ~RBER=1e-3 per page. At max PE, correction becomes marginal for TLC — motivates soft-decision LDPC for end-of-life reliability.


How to Add a Benchmark Result

  1. Run: opennandlab benchmark --workload <W> --config <config.yaml> --seed 42 --output results.json

  2. Copy the key metrics from results.json into the appropriate table above.

  3. Note the Python version, OS, and hardware.

  4. Open a PR — CI will verify the result is reproducible.


Benchmark Anti-Patterns

Anti-pattern

Why it’s wrong

Benchmarking without a fixed seed

Results are non-reproducible

Comparing configs with different OP%

OP% is the dominant WAF variable — it must be held constant

Measuring GC latency during a sequential workload

GC rarely triggers sequentially — the measurement is meaningless

Reporting avg latency without P99

Average hides GC tail spikes — always report P99 or P999

Comparing BCH t=4 to LDPC n=4096

Must compare at same code rate for a fair ECC comparison


Maintained by @muditbhargava66. Last updated: 2026-05.