XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA

Feng Yu; Hongshi Tan; Yao Chen; Weng-Fai Wong; Bingsheng He

arXiv:2605.06052·cs.AR·May 8, 2026

XtraMAC: An Efficient MAC Architecture for Mixed-Precision LLM Inference on FPGA

Feng Yu, Hongshi Tan, Yao Chen, Weng-Fai Wong, Bingsheng He

PDF

1 Repo

TL;DR

XtraMAC is a flexible, efficient FPGA MAC architecture supporting mixed-precision LLM inference, achieving higher density, lower resource use, and better energy efficiency.

Contribution

It introduces a datatype-adaptive MAC microarchitecture that unifies multiple precisions with efficient resource sharing and constant latency.

Findings

01

Achieves 1.4-2.0x higher compute density on FPGA.

02

Reduces LUT, FF, DSP consumption by 27-51%.

03

Delivers up to 1.9x energy efficiency and 1.2x speedup.

Abstract

The widespread adoption of mixed-precision quantization in large language models (LLMs) has created demand for hardware that can efficiently perform multiply-accumulate (MAC) operations across mixed datatypes and switch datatypes at runtime. Existing FPGA-based MAC solutions fall short due to limitations in fixed-datatype design, inefficient spatial or temporal resource sharing, and poor support for mixed-precision execution. These limitations collectively lead to under-utilization of DSP resources, limiting achievable parallelism and throughput. In this work, we present XtraMAC, a novel MAC architecture that unifies integer, floating-point, and mixed-precision operations within a single, datatype-adaptive microarchitecture. XtraMAC decomposes all supported MAC formats into a shared integer mantissa product with lightweight sign and exponent handling, enabling dynamic operand packing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Xtra-Computing/XtraMAC
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.