Zero-Space Cost Fault Tolerance for Transformer-based Language Models on   ReRAM

Bingbing Li; Geng Yuan; Zigeng Wang; Shaoyi Huang; Hongwu Peng; Payman; Behnam; Wujie Wen; Hang Liu; Caiwen Ding

arXiv:2401.11664·cs.LG·January 23, 2024·2 cites

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM

Bingbing Li, Geng Yuan, Zigeng Wang, Shaoyi Huang, Hongwu Peng, Payman, Behnam, Wujie Wen, Hang Liu, Caiwen Ding

PDF

Open Access

TL;DR

This paper introduces a zero-space cost fault tolerance mechanism for transformer-based language models on ReRAM, combining structure pruning, weight duplication, and MSB embedding to enhance robustness without additional storage.

Contribution

It proposes a novel fault protection method that achieves fault tolerance in ReRAM-based transformers without extra space, using innovative weight and structure modifications.

Findings

01

Effective fault tolerance on nine GLUE tasks with BERT

02

No additional storage overhead achieved

03

Improved robustness against hardware faults

Abstract

Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication. However, hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. While additional crossbars can be used to address these failures, they come with storage overhead and are not efficient in terms of space, energy, and cost. In this paper, we propose a fault protection mechanism that incurs zero space cost. Our approach includes: 1) differentiable structure pruning of rows and columns to reduce model redundancy, 2) weight duplication and voting for robust output, and 3) embedding duplicated most significant bits (MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE benchmark with the BERT model, and experimental results prove its…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Advanced Neural Network Applications

MethodsMulti-Head Attention · Attention Is All You Need · Residual Connection · Dropout · Linear Layer · Linear Warmup With Linear Decay · Softmax · Pruning · Dense Connections · Weight Decay