RangeGuard: Efficient, Bounded Approximate Error Correction for Reliable DNNs
Hanum Ko, Sangheum Yeon, Jong Hwan Ko, Jungrae Kim

TL;DR
RangeGuard is a metadata-based error correction framework that encodes numerical range information to efficiently detect and correct errors in DNNs, ensuring reliability with minimal redundancy.
Contribution
It introduces RangeGuard, a novel approach using Range Identifiers to provide bounded approximate correction for DNNs with high efficiency and low overhead.
Findings
RangeGuard tolerates 64+ bit flips with only 16 bits of parity.
It maintains DNN accuracy despite frequent memory errors.
RangeGuard effectively bounds error magnitudes within numerical ranges.
Abstract
As DRAM scales in density and adopts 3D integration, raw fault rates increase and multi-bit errors are no longer rare. Such errors can severely impact Deep Neural Networks (DNNs): although DNNs tolerate small numerical perturbations, random bit flips can create extreme outliers that propagate and sharply degrade accuracy. Large Language Models (LLMs) are particularly vulnerable because attention, residual, and normalization layers can amplify and preserve a single corrupted activation across many layers, destabilizing inference. This paper introduces RangeGuard, a metadata-centric error-correcting framework that provides strong reliability and high efficiency based on bounded approximate correction. Instead of protecting raw bits, RangeGuard encodes compact Range Identifiers (RIDs) that capture the numerical range of each value. These compact metadata enable efficient use of limited…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
