Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
Bingbing Li, Geng Yuan, Zigeng Wang, Shaoyi Huang, Hongwu Peng, Payman, Behnam, Wujie Wen, Hang Liu, Caiwen Ding

TL;DR
This paper introduces a zero-space cost fault tolerance mechanism for transformer-based language models on ReRAM, combining structure pruning, weight duplication, and MSB embedding to enhance robustness without additional storage.
Contribution
It proposes a novel fault protection method that achieves fault tolerance in ReRAM-based transformers without extra space, using innovative weight and structure modifications.
Findings
Effective fault tolerance on nine GLUE tasks with BERT
No additional storage overhead achieved
Improved robustness against hardware faults
Abstract
Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication. However, hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. While additional crossbars can be used to address these failures, they come with storage overhead and are not efficient in terms of space, energy, and cost. In this paper, we propose a fault protection mechanism that incurs zero space cost. Our approach includes: 1) differentiable structure pruning of rows and columns to reduce model redundancy, 2) weight duplication and voting for robust output, and 3) embedding duplicated most significant bits (MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE benchmark with the BERT model, and experimental results prove its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Advanced Neural Network Applications
MethodsMulti-Head Attention · Attention Is All You Need · Residual Connection · Dropout · Linear Layer · Linear Warmup With Linear Decay · Softmax · Pruning · Dense Connections · Weight Decay
