MXFormer: A Microscaling Floating-Point Charge-Trap Transistor Compute-in-Memory Transformer Accelerator
George Karfakis, Samyak Chakrabarty, Vinod Kurian Jacob, Siyun Qiao, Subramanian S. Iyer, Sudhakar Pamarti, Puneet Gupta

TL;DR
MXFormer is a novel compute-in-memory Transformer accelerator using charge-trap transistors, achieving high throughput and efficiency by eliminating weight movement and enabling on-chip storage of large models.
Contribution
It introduces a hybrid, weight-stationary CIM architecture with ultra-dense charge-trap transistors, enabling large model storage and high-speed inference without model retraining.
Findings
Processes 58275 FPS on ViT-L/32 (dual-chip)
Outperforms state-of-the-art accelerators by up to 60.5x in compute density
Improves energy efficiency by 1.7x-2.5x
Abstract
The proliferation of Transformer models is often constrained by the significant computational and memory bandwidth demands of deployment. To address this, we present MXFormer, a novel, hybrid, weight-stationary Compute-in-Memory (CIM) accelerator that provides high throughput and efficiency for fixed-model inference on large short-sequence Transformers. Our architecture's foundation is the use of ultra-dense Charge-Trap Transistors (CTTs) in Microscaling MXFP4 CIM arrays, uniquely enabling the on-chip storage of up to hundreds of millions of parameters in Fully Weight Stationary (FWS) fashion. We introduce a statically partitioned design with 12 Transformer blocks connected by a deeply pipelined dataflow. Static-weight layers (MLPs and linear projections) execute on highly parallel analog CTT arrays using an MXFP4-native flow with per-block exponent alignment and a 10-bit SAR ADC.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Reservoir Computing · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
