ReTern: Exploiting Natural Redundancy and Sign Transformations for Enhanced Fault Tolerance in Compute-in-Memory based Ternary LLMs
Akul Malhotra, Sumeet Kumar Gupta

TL;DR
ReTern enhances fault tolerance in ternary LLMs on TCiM accelerators by using fault-aware sign transformations and exploiting natural redundancy, significantly reducing perplexity under faults with minimal overhead.
Contribution
The paper introduces ReTern, a novel method combining fault-aware sign transformations and bit-cell reprogramming to improve fault tolerance in ternary LLMs on TCiM hardware.
Findings
35% reduction in perplexity under faults
Less than 3% energy overhead
Less than 7% latency overhead
Abstract
Ternary large language models (LLMs), which utilize ternary precision weights and 8-bit activations, have demonstrated competitive performance while significantly reducing the high computational and memory requirements of full-precision LLMs. The energy efficiency and performance of Ternary LLMs can be further improved by deploying them on ternary computing-in-memory (TCiM) accelerators, thereby alleviating the von-Neumann bottleneck. However, TCiM accelerators are prone to memory stuck-at faults (SAFs) leading to degradation in the model accuracy. This is particularly severe for LLMs due to their low weight sparsity. To boost the SAF tolerance of TCiM accelerators, we propose ReTern that is based on (i) fault-aware sign transformations (FAST) and (ii) TCiM bit-cell reprogramming exploiting their natural redundancy. The key idea is to utilize FAST to minimize computations errors due to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFuel Cells and Related Materials · Fault Detection and Control Systems · Brain Tumor Detection and Classification
