Benchmarks
This document defines the benchmark methodology for OpenNANDLab and serves as the living record of results across versions.
Methodology
Reproducibility contract
Every benchmark in this file must be reproducible with a single command:
opennandlab benchmark --workload <name> --config docs/resources/config/config.yaml
Results are deterministic for a given config + random seed (default seed = 42). Set --seed to change.
What we measure
Metric |
Definition |
Unit |
|---|---|---|
WAF |
NAND bytes written / host bytes written |
× (dimensionless, ≥ 1) |
Throughput |
Host bytes written per second |
MB/s |
Avg latency |
Mean write latency across all ops |
µs |
P99 latency |
99th-percentile write latency |
µs |
P999 latency |
99.9th-percentile write latency |
µs |
GC cycles |
Number of block erases triggered by GC |
count |
ECC correction rate |
Errors corrected / total reads |
% |
UBER |
Uncorrectable errors / total reads |
rate |
WL stddev |
Std deviation of per-block erase counts |
count |
Lifetime estimate |
max_pe / current_erase_rate |
days |
Standard test configs
TLC-standard (docs/resources/config/tlc_standard.yaml):
nand:
cell_type: TLC
blocks_per_plane: 1024
pages_per_block: 256
page_size_bytes: 4096
max_pe_cycles: 3000
ftl:
gc_policy: greedy
gc_trigger_free_pct: 0.10
over_provisioning_pct: 0.07
ecc:
algorithm: bch
bch_m: 8
bch_t: 4
MLC-enterprise (docs/resources/config/mlc_enterprise.yaml):
nand:
cell_type: MLC
max_pe_cycles: 10000
rber_floor: 1.0e-9
rber_ceil: 5.0e-4
ftl:
gc_policy: cost_benefit
over_provisioning_pct: 0.20
Workload Definitions
W1: Sequential write
1 GiB total writes, 4 KiB pages, queue depth 1
Access pattern: LBA 0 → max, sequential
W2: Random write
1 GiB total writes, 4 KiB pages, queue depth 32
Access pattern: uniform random LBA
W3: Mixed 70/30
1 GiB total I/O, 70% reads / 30% writes, 4 KiB
Access pattern: 80/20 Zipf (hot/cold)
W4: Database OLTP (simulated)
512 MiB, 8 KiB average I/O, 50% read / 50% write
Random access pattern
W5: Long-run aging
50× device capacity writes (full endurance test)
Random write, records RBER and WAF evolution over time
Results — v2.0 (TLC-standard config, greedy GC, CPython 3.12, Apple M2)
WAF comparison: GC policies
Policy |
W1 (Seq) |
W2 (Rand) |
W3 (Mixed) |
|---|---|---|---|
Greedy |
1.07× |
3.21× |
2.18× |
Cost-benefit |
1.05× |
2.63× |
1.94× |
Δ |
-1.9% |
-18.1% |
-11.0% |
Cost-benefit GC reduces WAF by ~18% on random-write workloads. Trade-off: +12% GC selection overhead.
Latency (µs) — W2 random write, greedy GC
Percentile |
Without GC spike |
With GC spike |
|---|---|---|
P50 |
12 |
12 |
P90 |
19 |
890 |
P99 |
45 |
2 100 |
P999 |
78 |
8 400 |
GC spikes dominate tail latency. Cost-benefit GC reduces P999 by ~30% by selecting blocks with fewer valid pages (less copying work).
Wear distribution — W2 random write, 10 000 host writes
Policy |
Min PE |
Max PE |
Mean PE |
Stddev |
|---|---|---|---|---|
Dynamic WL |
8 |
14 |
11.2 |
1.3 |
No WL |
0 |
31 |
10.9 |
6.8 |
Dynamic wear leveling reduces PE stddev by 5× on this workload.
ECC — BCH vs. LDPC (hard-decision) at RBER = 1e-4
Algorithm |
BLER |
Correction latency |
|---|---|---|
BCH (m=8, t=4) |
2.1e-5 |
280 µs |
LDPC (n=1024, hard) |
8.3e-6 |
410 µs |
LDPC (n=1024, soft) |
1.2e-6 |
580 µs |
LDPC with soft-decision provides ~17× lower BLER than BCH at equivalent RBER, at the cost of 2× latency.
RBER vs. P/E cycles (TLC, Weibull model)
P/E cycles |
RBER |
|---|---|
0 |
1.00e-8 |
500 |
4.12e-6 |
1 500 |
3.24e-4 |
3 000 |
9.87e-4 |
BCH (t=4) corrects up to ~RBER=1e-3 per page. At max PE, correction becomes marginal for TLC — motivates soft-decision LDPC for end-of-life reliability.
How to Add a Benchmark Result
Run:
opennandlab benchmark --workload <W> --config <config.yaml> --seed 42 --output results.jsonCopy the key metrics from
results.jsoninto the appropriate table above.Note the Python version, OS, and hardware.
Open a PR — CI will verify the result is reproducible.
Benchmark Anti-Patterns
Anti-pattern |
Why it’s wrong |
|---|---|
Benchmarking without a fixed seed |
Results are non-reproducible |
Comparing configs with different OP% |
OP% is the dominant WAF variable — it must be held constant |
Measuring GC latency during a sequential workload |
GC rarely triggers sequentially — the measurement is meaningless |
Reporting avg latency without P99 |
Average hides GC tail spikes — always report P99 or P999 |
Comparing BCH t=4 to LDPC n=4096 |
Must compare at same code rate for a fair ECC comparison |
Maintained by @muditbhargava66. Last updated: 2026-05.