Data Flow

This document details the typical data processing pipeline within the OpenNANDLab simulator during operations. It covers the end-to-end traversal of data through the optimization modules, Flash Translation Layer, and underlying physical NAND arrays.

1. Configuration Initialization

  • The system loads and parses configuration from the Pydantic-based SimulatorConfig model or from a supplied config.yaml file.

  • Individual components (PageFTL, ECCHandler, CachingSystem, NANDSimulator) initialize with their specific settings.

  • Firmware specifications and block limits are mapped into the execution space.

  • The NANDController validates that page_size and blocks_per_plane dimensions are mathematically compatible.

2. Write Operations

  • A LogicalRequest(op, lba, size, data) arrives at the NANDController.

  • Compression Layer: Checks the entropy of the payload and reduces data size using LZ4 or Zstandard if beneficial.

  • Write Buffer: The data is placed in the FTL’s Write Buffer (defaults to 64 pages) to facilitate sequential flushing.

  • Buffer Flush: When the buffer reaches capacity, the flush routine triggers sequentially:

    • ECC Layer: The ECCHandler mathematically encodes the data and attaches error correction parities (BCH or LDPC).

    • Scrambling: Optional bitwise XOR with deterministic layout seeds to improve electrical stability.

    • L2P Translation: The PageFTL allocates a FREE physical page from its Active Block, updates the logical-to-physical flat array, and flags the previous physical location as INVALID.

    • Wear Leveling: The engine increments the P/E cycle count for the active block and updates its position in the wear-tracking min-heap.

    • Bad Block Management: Verifies the allocated block is fully operational before executing physical voltage signals.

    • Physical Execution: The final codeword bytes are routed to the NANDSimulator.

  • Cache Registration: Successfully written payloads are registered in the caching system for swift sub-sequent access.

3. Read Operations

  • A read request queries the NANDController with a Logical Block Number (LBN).

  • Write Buffer Hook: First, the controller checks if the data resides in the FTL’s Write Buffer. If so, it is immediately decompressed and returned.

  • Caching Layer: Evaluates if the exact LBN resides in the fast CachingSystem (LRU/LFU/FIFO algorithms). A cache hit entirely bypasses the physical hardware simulation.

  • Physical Resolution: If uncached, the PageFTL maps the LBN to its physical page number (PPN) using the L2P array.

  • Hardware Access: The NANDSimulator provides the raw codeword bytes containing data and parity.

  • ECC Decoding: The ECCHandler reviews the codeword.

    • If errors exist due to the physical RBER model, it mathematically locates and attempts to correct them.

    • If errors exceed the capacity limit (e.g., beyond t for BCH), it raises an UncorrectableECCError.

  • Decompression: Original dimensions are restored via the decompression algorithm.

  • Data is returned to the caller and seamlessly inserted into the CachingSystem.

4. Garbage Collection

  • Foreground Activation: During a Write Buffer flush, if the FTL observes the free_pool of erased blocks dipping below a critical limit (usually 10%), a GC cycle is synchronously triggered.

  • Victim Selection: The GreedyGC or CostBenefitGC evaluates the physical blocks.

  • Data Evacuation: All VALID pages from the chosen victim block are physically read, verified through ECC, and moved to newly allocated pages in a fresh block.

  • Block Reset: The victim block undergoes a destructive Erase Operation (0xFF replacement), incrementing its total P/E cycle limit.

  • Pool Restoration: The freshly erased block is appended back to the FTL’s free_pool.

5. Optimization and Analysis

  • At every stage, independent telemetry events (like WriteEvent, EraseEvent, ECCEvent) are pushed to the central event bus.

  • The AnalyticsEngine intercepts these events to compute real-time derivatives such as IOPS, Latency Percentiles, Write Amplification Factor (WAF), and the ECC correction rate.

  • At the conclusion of a workload sequence, this parsed data is synthesized by the Streamlit Dashboard or Click CLI for comprehensive developer review.