FAT: An In-Memory Accelerator with Fast Addition for Ternary Weight Neural Networks
Shien Zhu, Luan H.K. Duong, Hui Chen, Di Liu, Weichen Liu

TL;DR
This paper introduces FAT, an in-memory accelerator optimized for ternary weight neural networks, leveraging sparsity and fast addition techniques to significantly improve speed and energy efficiency over existing solutions.
Contribution
FAT presents novel hardware techniques including a Sparse Addition Control Unit and a memory-based fast addition scheme specifically designed for TWNs, enhancing IMC acceleration.
Findings
FAT achieves 2.00X speedup in addition operations.
FAT improves power and area efficiency by 22%.
FAT outperforms ParaPIM with 10X speedup and 12X energy efficiency on sparse networks.
Abstract
Convolutional Neural Networks (CNNs) demonstrate excellent performance in various applications but have high computational complexity. Quantization is applied to reduce the latency and storage cost of CNNs. Among the quantization methods, Binary and Ternary Weight Networks (BWNs and TWNs) have a unique advantage over 8-bit and 4-bit quantization. They replace the multiplication operations in CNNs with additions, which are favoured on In-Memory-Computing (IMC) devices. IMC acceleration for BWNs has been widely studied. However, though TWNs have higher accuracy and better sparsity than BWNs, IMC acceleration for TWNs has limited research. TWNs on existing IMC devices are inefficient because the sparsity is not well utilized, and the addition operation is not efficient. In this paper, we propose FAT as a novel IMC accelerator for TWNs. First, we propose a Sparse Addition Control Unit,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
