In-Memory ADC-Based Nonlinear Activation Quantization for Efficient In-Memory Computing
Shuai Dong, Junyi Yang, Biyan Zhou, Hongyang Shang, Gourav Datta, Arindam Basu

TL;DR
This paper presents BS-KMQ, a novel nonlinear quantization method that reduces ADC resolution needs in in-memory computing, leading to significant improvements in accuracy, area, speed, and energy efficiency.
Contribution
Introduction of Boundary Suppressed K-Means Quantization (BS-KMQ), a new NL quantization approach that suppresses outliers for better ADC efficiency and system performance.
Findings
Achieves 7x area reduction in NL-ADC design.
Reduces quantization error by at least 3x over existing methods.
Provides up to 24x energy efficiency improvement in system simulations.
Abstract
In deep networks, operations such as ReLU and hardware-driven clamping often cause activations to accumulate near the edges of the distribution, leading to biased clustering and suboptimal quantization in existing nonlinear (NL) quantization methods. This paper introduces Boundary Suppressed K-Means Quantization (BS-KMQ), a novel NL quantization approach designed to reduce the resolution requirements of analog-to-digital converters (ADCs) in in-memory computing (IMC) systems. By suppressing boundary outliers before clustering, BS-KMQ achieves more balanced and informative NL quantization levels. The resulting NL references are implemented using a reconfigurable in-memory NL-ADC, achieving a 7x area improvement over prior NL-ADC designs. When evaluated on ResNet-18, VGG-16, Inception-V3, and DistilBERT, BS-KMQ achieves at least 3x lower quantization error compared to linear, Lloyd-Max,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFerroelectric and Negative Capacitance Devices · Advanced Memory and Neural Computing · Parallel Computing and Optimization Techniques
