Tvarak: Software-managed hardware offload for DAX NVM storage redundancy
Rajat Kateja, Nathan Beckmann, Gregory R. Ganger

TL;DR
Tvarak introduces a hardware offload system for DAX NVM storage redundancy, significantly improving data integrity and performance over software-only solutions by integrating system-level checksums and parity at the hardware level.
Contribution
The paper presents Tvarak, a novel hardware controller that offloads redundancy management for DAX NVM, enabling efficient detection and recovery from data corruption.
Findings
Tvarak reduces Redis performance overhead to 3%.
Simulation shows Tvarak improves energy efficiency.
Compared to software-only methods, Tvarak offers better data protection with minimal performance loss.
Abstract
Tvarak efficiently implements system-level redundancy for direct-access (DAX) NVM storage. Production storage systems complement device-level ECC (which covers media errors) with system-checksums and cross-device parity. This system-level redundancy enables detection of and recovery from data corruption due to device firmware bugs (e.g., reading data from the wrong physical location). Direct access to NVM penalizes software-only implementations of system-level redundancy, forcing a choice between lack of data protection or significant performance penalties. Offloading the update and verification of system-level redundancy to Tvarak, a hardware controller co-located with the last-level cache, enables efficient protection of data from such bugs in memory controller and NVM DIMM firmware. Simulation-based evaluation with seven data-intensive applications shows Tvarak's performance and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed systems and fault tolerance
