When Small Variations Become Big Failures: Reliability Challenges in Compute-in-Memory Neural Accelerators
Yifan Qin, Jiahao Zheng, Zheyu Yan, Wujie Wen, Xiaobo Sharon Hu, Yiyu Shi

TL;DR
This paper investigates the reliability issues in compute-in-memory neural accelerators caused by device variability, and proposes techniques like selective verification and noise-aware training to enhance robustness for safety-critical applications.
Contribution
It introduces SWIM, a selective write-verify method, and a noise-aware training approach to improve reliability of CiM accelerators under device non-idealities.
Findings
Small device variations can cause large accuracy drops.
SWIM significantly improves reliability with minimal efficiency loss.
Training with right-censored Gaussian noise enhances worst-case robustness.
Abstract
Compute-in-memory (CiM) architectures promise significant improvements in energy efficiency and throughput for deep neural network acceleration by alleviating the von Neumann bottleneck. However, their reliance on emerging non-volatile memory devices introduces device-level non-idealities-such as write variability, conductance drift, and stochastic noise-that fundamentally challenge reliability, predictability, and safety, especially in safety-critical applications. This talk examines the reliability limits of CiM-based neural accelerators and presents a series of techniques that bridge device physics, architecture, and learning algorithms to address these challenges. We first demonstrate that even small device variations can lead to disproportionately large accuracy degradation and catastrophic failures in safety-critical inference workloads, revealing a critical gap between…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Advanced Neural Network Applications
