Column-wise Quantization of Weights and Partial Sums for Accurate and   Efficient Compute-In-Memory Accelerators

Jiyoon Kim; Kang Eun Jeon; Yulhwa Kim; and Jong Hwan Ko

arXiv:2502.07842·cs.AR·March 14, 2025

Column-wise Quantization of Weights and Partial Sums for Accurate and Efficient Compute-In-Memory Accelerators

Jiyoon Kim, Kang Eun Jeon, Yulhwa Kim, and Jong Hwan Ko

PDF

Open Access 1 Repo

TL;DR

This paper introduces a column-wise weight and partial-sum quantization method for compute-in-memory accelerators, improving accuracy and robustness while reducing overhead and simplifying training.

Contribution

It proposes a novel column-wise quantization scheme that aligns weight and partial-sum granularity, enhancing accuracy and robustness in CIM accelerators.

Findings

01

Achieved up to 2.69% accuracy improvement on CIFAR datasets.

02

Demonstrated robustness against memory cell variations.

03

Provided an open-source CIM convolution framework.

Abstract

Compute-in-memory (CIM) is an efficient method for implementing deep neural networks (DNNs) but suffers from substantial overhead from analog-to-digital converters (ADCs), especially as ADC precision increases. Low-precision ADCs can reduce this overhead but introduce partial-sum quantization errors degrading accuracy. Additionally, low-bit weight constraints, imposed by cell limitations and the need for multiple cells for higher-bit weights, present further challenges. While fine-grained partial-sum quantization has been studied to lower ADC resolution effectively, weight granularity, which limits overall partial-sum quantized accuracy, remains underexplored. This work addresses these challenges by aligning weight and partial-sum quantization granularities at the column-wise level. Our method improves accuracy while maintaining dequantization overhead, simplifies training by removing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jiyoonkm/columnquant
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Neural Networks and Applications · Distributed and Parallel Computing Systems

MethodsConvolution