ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute
Siddhartha Raman Sundara Raman, Jaydeep P. Kulkarni

TL;DR
This paper introduces ABI, a unified near-memory GPU architecture that significantly improves speed and energy efficiency for diverse workloads like deep learning and linear algebra through custom sparsity-aware circuits and reconfigurable compute.
Contribution
ABI presents a novel integrated GPU design with sparsity-awareness and reconfigurability, achieving substantial performance and energy efficiency gains over existing architectures.
Findings
6 to 16 times speedup over MIAOW GPU
6 to 13 times energy savings compared to MIAOW GPU
4.5 times speedup on ABI-enabled MI300 and Blackwell systems
Abstract
We present a tightly integrated and unified near-memory GPU architecture that delivers 6 to 16 times speedup and 6 to 13 times energy savings across Convolutional Neural Networks, Graph Convolutional Networks, Linear Programming, Large Language Models, and Ising workloads compared to MIAOW GPU. The design includes a custom sparsity-aware near-memory circuit providing about 1.5 times energy savings, and a lightweight softmax circuit providing about 1.6 times energy savings. The architecture supports reconfigurable compute up to INT16 with dynamic resolution updates and scales efficiently across problem sizes. ABI-enabled MI300 and Blackwell systems achieve about 4.5 times speedup over baseline MI300 and Blackwell.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
