RRCD: Redirecci\'on de Registros Basada en Compresi\'on de Datos para Tolerar FallosPermanentes en una GPU
Yamilka Toca-D\'iaz, Alejandro Valero, Rub\'en Gran-Tejero, Dar\'io, Su\'arez-Gracia

TL;DR
This paper introduces DC-Patch, a microarchitectural technique that compresses register data to tolerate permanent faults in GPU register files operating below safe voltage levels, significantly reducing energy consumption while maintaining reliability.
Contribution
The paper presents a novel run-time register compression method, DC-Patch, that tolerates permanent faults without compiler changes or instruction set modifications, improving energy efficiency in faulty GPU register files.
Findings
Reduces energy consumption by 47% compared to conventional register files.
Ensures reliable operation with over a third faulty register entries.
Implements with less than 2% system performance impact.
Abstract
The ever-increasing parallelism demand of General-Purpose Graphics Processing Unit (GPGPU) applications pushes toward larger and more energy-hungry register files in successive GPU generations. Reducing the supply voltage beyond its safe limit is an effective way to improve the energy efficiency of register files. However, at these operating voltages, the reliability of the circuit is compromised. This work aims to tolerate permanent faults from process variations in large GPU register files operating below the safe supply voltage limit. To do so, this paper proposes a microarchitectural patching technique, DC-Patch, exploiting the inherent data redundancy of applications to compress registers at run-time with neither compiler assistance nor instruction set modifications. Instead of disabling an entire faulty register file entry, DC-Patch leverages the reliable cells within a faulty…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Interconnection Networks and Systems
