ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

Siddhartha Raman Sundara Raman; Jaydeep P. Kulkarni

arXiv:2602.14262·cs.AR·April 7, 2026

ABI: A tightly integrated, unified, sparsity-aware, reconfigurable, compute near-register file/cache GPU architecture with light-weight softmax for deep learning, linear algebra, and Ising compute

Siddhartha Raman Sundara Raman, Jaydeep P. Kulkarni

PDF

TL;DR

This paper introduces ABI, a unified near-memory GPU architecture that significantly improves speed and energy efficiency for diverse workloads like deep learning and linear algebra through custom sparsity-aware circuits and reconfigurable compute.

Contribution

ABI presents a novel integrated GPU design with sparsity-awareness and reconfigurability, achieving substantial performance and energy efficiency gains over existing architectures.

Findings

01

6 to 16 times speedup over MIAOW GPU

02

6 to 13 times energy savings compared to MIAOW GPU

03

4.5 times speedup on ABI-enabled MI300 and Blackwell systems

Abstract

We present a tightly integrated and unified near-memory GPU architecture that delivers 6 to 16 times speedup and 6 to 13 times energy savings across Convolutional Neural Networks, Graph Convolutional Networks, Linear Programming, Large Language Models, and Ising workloads compared to MIAOW GPU. The design includes a custom sparsity-aware near-memory circuit providing about 1.5 times energy savings, and a lightweight softmax circuit providing about 1.6 times energy savings. The architecture supports reconfigurable compute up to INT16 with dynamic resolution updates and scales efficiently across problem sizes. ABI-enabled MI300 and Blackwell systems achieve about 4.5 times speedup over baseline MI300 and Blackwell.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.