Addressing multiple bit/symbol errors in DRAM subsystem
Ravikiran Yeleswarapu, Arun K. Somani

TL;DR
This paper proposes SSCMSD, a novel error handling scheme combining ECC and hashing to detect and correct multiple symbol errors in DRAM, improving fault tolerance without increasing read latency.
Contribution
It introduces SSCMSD, a new error correction and detection scheme for DRAM that handles multiple symbol errors using ECC and hashing, reducing silent data corruptions.
Findings
Effectively prevents silent data corruptions in simulations.
Achieves error detection and correction without additional read latency.
Requires 19 chips per rank and additional hash logic at the controller.
Abstract
As DRAM technology continues to evolve towards smaller feature sizes and increased densities, faults in DRAM subsystem are becoming more severe. Current servers mostly use CHIPKILL based schemes to tolerate up-to one/two symbol errors per DRAM beat. Multi-symbol errors arising due to faults in multiple data buses and chips may not be detected by these schemes. In this paper, we introduce Single Symbol Correction Multiple Symbol Detection (SSCMSD) - a novel error handling scheme to correct single-symbol errors and detect multi-symbol errors. Our scheme makes use of a hash in combination with Error Correcting Code (ECC) to avoid silent data corruptions (SDCs). SSCMSD can also enhance the capability of detecting errors in address bits. We employ 32-bit CRC along with Reed-Solomon code to implement SSCMSD for a x4 based DDRx system. Our simulations show that the proposed scheme effectively…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiation Effects in Electronics · Interconnection Networks and Systems · Low-power high-performance VLSI design
