Topkima-Former: Low-energy, Low-Latency Inference for Transformers using top-k In-memory ADC
Shuai Dong, Junyi Yang, Xiaoqi Peng, Hongyang Shang, Ye Ke, Xiaofeng, Yang, Hongjie Liu, Arindam Basu

TL;DR
Topkima-Former introduces a low-energy, low-latency softmax approximation using top-k in-memory ADC, significantly accelerating transformer inference with minimal accuracy loss across NLP and CV models.
Contribution
It proposes a novel in-memory ADC-based top-k softmax implementation, combined with architectural improvements, enabling faster and more energy-efficient transformer inference.
Findings
Achieves 1.8x-84x speedup over prior IMC accelerators.
Reduces energy consumption by 1.3x-35x compared to previous solutions.
Maintains less than 1.2% accuracy loss on key NLP and CV benchmarks.
Abstract
Transformer model has gained prominence as a popular deep neural network architecture for neural language processing (NLP) and computer vision (CV) applications. However, the extensive use of nonlinear operations, like softmax, poses a performance bottleneck during transformer inference and comprises up to 40% of the total latency. Hence, we propose innovations at the circuit, architecture, and algorithm levels to accelerate the transformer. At the circuit level, we propose topkima-combining top-k activation selection with in-memory ADC (IMA) to implement a low-energy and low-latency softmax without any sorting latency. Only the k largest activations are sent to the softmax calculation block, reducing the huge computational cost of softmax. Using a modified training scheme with top-k only in the forward pass, experimental results demonstrate only a 0.4% to 1.2% reduction in accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Semiconductor materials and devices · Ferroelectric and Negative Capacitance Devices
