Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators
Jiyoon Kim, Kang Eun Jeon, Yulhwa Kim, and Jong Hwan Ko

TL;DR
This paper introduces a column-wise weight and partial-sum quantization method for compute-in-memory accelerators, improving accuracy and robustness while reducing overhead and simplifying training.
Contribution
It proposes a novel column-wise quantization scheme that aligns weight and partial-sum granularity, enhancing accuracy and robustness in CIM accelerators.
Findings
Achieved up to 2.69% accuracy improvement on CIFAR datasets.
Demonstrated robustness against memory cell variations.
Provided an open-source CIM convolution framework.
Abstract
Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision increases. Low-precision ADCs can reduce this overhead but introduce partial-sum quantization errors degrading accuracy. Additionally, low-bit weight constraints, imposed by cell limitations and the need for multiple cells for higher-bit weights, present further challenges. While fine-grained partial-sum quantization has been studied to lower ADC resolution effectively, weight granularity, which limits overall partial-sum quantized accuracy, remains underexplored. This work addresses these challenges by aligning weight and partial-sum quantization granularities at the column-wise level. Our method improves accuracy while maintaining dequantization overhead, simplifies training by removing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Distributed and Parallel Computing Systems
MethodsConvolution
