SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators

Swastik Bhattacharya; Sanjay Das; Anand Menon; Shamik Kundu; Arnab Raha; Kanad Basu

arXiv:2512.00059·cs.AR·December 2, 2025

SafeCiM: Investigating Resilience of Hybrid Floating-Point Compute-in-Memory Deep Learning Accelerators

Swastik Bhattacharya, Sanjay Das, Anand Menon, Shamik Kundu, Arnab Raha, Kanad Basu

PDF

Open Access

TL;DR

This paper investigates the vulnerability of floating-point compute-in-memory deep learning accelerators to hardware faults and proposes SafeCiM, a fault-resilient design that significantly improves robustness in mission-critical AI applications.

Contribution

It systematically analyzes fault effects in FP-CiM architectures and introduces SafeCiM, a novel design that enhances fault tolerance for deep learning accelerators.

Findings

01

Single adder faults can reduce LLM accuracy to 0%

02

SafeCiM reduces accuracy degradation by up to 49x compared to baseline

03

Fault effects vary across computational stages in FP-CiM architectures

Abstract

Deep Neural Networks (DNNs) continue to grow in complexity with Large Language Models (LLMs) incorporating vast numbers of parameters. Handling these parameters efficiently in traditional accelerators is limited by data-transmission bottlenecks, motivating Compute-in-Memory (CiM) architectures that integrate computation within or near memory to reduce data movement. Recent work has explored CiM designs using Floating-Point (FP) and Integer (INT) operations. FP computations typically deliver higher output quality due to their wider dynamic range and precision, benefiting precision-sensitive Generative AI applications. These include models such as LLMs, thus driving advancements in FP-CiM accelerators. However, the vulnerability of FP-CiM to hardware faults remains underexplored, posing a major reliability concern in mission-critical settings. To address this gap, we systematically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Radiation Effects in Electronics · Parallel Computing and Optimization Techniques