A Digital SRAM-Based Compute-In-Memory Macro for Weight-Stationary Dynamic Matrix Multiplication in Transformer Attention Score Computation

Jianyi Yu; Tengxiao Wang; Yuxuan Wang; Xiang Fu; Fei Qiao; Ying Wang; Rui Yuan; Liyuan Liu; Cong Shi

arXiv:2511.12152·cs.AR·December 15, 2025

A Digital SRAM-Based Compute-In-Memory Macro for Weight-Stationary Dynamic Matrix Multiplication in Transformer Attention Score Computation

Jianyi Yu, Tengxiao Wang, Yuxuan Wang, Xiang Fu, Fei Qiao, Ying Wang, Rui Yuan, Liyuan Liu, Cong Shi

PDF

Open Access

TL;DR

This paper presents a digital compute-in-memory macro for Transformer attention score computation, achieving high energy and area efficiency by reconstructing dynamic matrix multiplication as static and optimizing data sparsity handling.

Contribution

It introduces a novel 2-input static matrix multiplication method with zero-value bit skipping, significantly improving energy and area efficiency in Transformer CIM implementations.

Findings

01

Achieves 42.27 GOPS at 1.24 mW in 65-nm process

02

Delivers 34.1 TOPS/W energy efficiency

03

Outperforms CPUs and GPUs by 25x and 13x respectively

Abstract

Compute-in-memory (CIM) techniques are widely employed in energy-efficient artificial intelligent (AI) processors. They alleviate power and latency bottlenecks caused by extensive data movements between compute and storage units. To extend these benefits to Transformer, this brief proposes a digital CIM macro to compute attention score. To eliminate dynamic matrix multiplication (MM), we reconstruct the computation as static MM using a combined QK-weight matrix, so that inputs can be directly fed to a single CIM macro to obtain the score results. However, this introduces a new challenge of 2-input static MM. The computation is further decomposed into four groups of bit-serial logical and addition operations. This allows 2-input to directly activate the word line via AND gate, thus realizing 2-input static MM with minimal overhead. A hierarchical zero-value bit skipping mechanism is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsParallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices · Low-power high-performance VLSI design