Benchmarks

This document defines the benchmark methodology for OpenNANDLab and serves as the living record of results across versions.

Methodology

Reproducibility contract

Every benchmark in this file must be reproducible with a single command:

opennandlab benchmark --workload <name> --config docs/resources/config/config.yaml

Results are deterministic for a given config + random seed (default seed = 42). Set --seed to change.

What we measure

Metric	Definition	Unit
WAF	NAND bytes written / host bytes written	× (dimensionless, ≥ 1)
Throughput	Host bytes written per second	MB/s
Avg latency	Mean write latency across all ops	µs
P99 latency	99th-percentile write latency	µs
P999 latency	99.9th-percentile write latency	µs
GC cycles	Number of block erases triggered by GC	count
ECC correction rate	Errors corrected / total reads	%
UBER	Uncorrectable errors / total reads	rate
WL stddev	Std deviation of per-block erase counts	count
Lifetime estimate	max_pe / current_erase_rate	days

Standard test configs

TLC-standard (docs/resources/config/tlc_standard.yaml):

nand:
  cell_type: TLC
  blocks_per_plane: 1024
  pages_per_block: 256
  page_size_bytes: 4096
  max_pe_cycles: 3000

ftl:
  gc_policy: greedy
  gc_trigger_free_pct: 0.10
  over_provisioning_pct: 0.07

ecc:
  algorithm: bch
  bch_m: 8
  bch_t: 4

MLC-enterprise (docs/resources/config/mlc_enterprise.yaml):

nand:
  cell_type: MLC
  max_pe_cycles: 10000
  rber_floor: 1.0e-9
  rber_ceil: 5.0e-4

ftl:
  gc_policy: cost_benefit
  over_provisioning_pct: 0.20

Workload Definitions

W1: Sequential write

1 GiB total writes, 4 KiB pages, queue depth 1
Access pattern: LBA 0 → max, sequential

W2: Random write

1 GiB total writes, 4 KiB pages, queue depth 32
Access pattern: uniform random LBA

W3: Mixed 70/30

1 GiB total I/O, 70% reads / 30% writes, 4 KiB
Access pattern: 80/20 Zipf (hot/cold)

W4: Database OLTP (simulated)

512 MiB, 8 KiB average I/O, 50% read / 50% write
Random access pattern

W5: Long-run aging

50× device capacity writes (full endurance test)
Random write, records RBER and WAF evolution over time

Results — v2.0 (TLC-standard config, greedy GC, CPython 3.12, Apple M2)

WAF comparison: GC policies

Policy	W1 (Seq)	W2 (Rand)	W3 (Mixed)
Greedy	1.07×	3.21×	2.18×
Cost-benefit	1.05×	2.63×	1.94×
Δ	-1.9%	-18.1%	-11.0%

Cost-benefit GC reduces WAF by ~18% on random-write workloads. Trade-off: +12% GC selection overhead.

Latency (µs) — W2 random write, greedy GC

Percentile	Without GC spike	With GC spike
P50	12	12
P90	19	890
P99	45	2 100
P999	78	8 400

GC spikes dominate tail latency. Cost-benefit GC reduces P999 by ~30% by selecting blocks with fewer valid pages (less copying work).

Wear distribution — W2 random write, 10 000 host writes

Policy	Min PE	Max PE	Mean PE	Stddev
Dynamic WL	8	14	11.2	1.3
No WL	0	31	10.9	6.8

Dynamic wear leveling reduces PE stddev by 5× on this workload.

ECC — BCH vs. LDPC (hard-decision) at RBER = 1e-4

Algorithm	BLER	Correction latency
BCH (m=8, t=4)	2.1e-5	280 µs
LDPC (n=1024, hard)	8.3e-6	410 µs
LDPC (n=1024, soft)	1.2e-6	580 µs

LDPC with soft-decision provides ~17× lower BLER than BCH at equivalent RBER, at the cost of 2× latency.

RBER vs. P/E cycles (TLC, Weibull model)

P/E cycles	RBER
0	1.00e-8
500	4.12e-6
1 500	3.24e-4
3 000	9.87e-4

BCH (t=4) corrects up to ~RBER=1e-3 per page. At max PE, correction becomes marginal for TLC — motivates soft-decision LDPC for end-of-life reliability.

How to Add a Benchmark Result

Run: opennandlab benchmark --workload <W> --config <config.yaml> --seed 42 --output results.json
Copy the key metrics from results.json into the appropriate table above.
Note the Python version, OS, and hardware.
Open a PR — CI will verify the result is reproducible.

Benchmark Anti-Patterns

Anti-pattern	Why it’s wrong
Benchmarking without a fixed seed	Results are non-reproducible
Comparing configs with different OP%	OP% is the dominant WAF variable — it must be held constant
Measuring GC latency during a sequential workload	GC rarely triggers sequentially — the measurement is meaningless
Reporting avg latency without P99	Average hides GC tail spikes — always report P99 or P999
Comparing BCH t=4 to LDPC n=4096	Must compare at same code rate for a fair ECC comparison

Maintained by @muditbhargava66. Last updated: 2026-05.