TL;DR
XtraMAC is a flexible, efficient FPGA MAC architecture supporting mixed-precision LLM inference, achieving higher density, lower resource use, and better energy efficiency.
Contribution
It introduces a datatype-adaptive MAC microarchitecture that unifies multiple precisions with efficient resource sharing and constant latency.
Findings
Achieves 1.4-2.0x higher compute density on FPGA.
Reduces LUT, FF, DSP consumption by 27-51%.
Delivers up to 1.9x energy efficiency and 1.2x speedup.
Abstract
The widespread adoption of mixed-precision quantization in large language models (LLMs) has created demand for hardware that can efficiently perform multiply-accumulate (MAC) operations across mixed datatypes and switch datatypes at runtime. Existing FPGA-based MAC solutions fall short due to limitations in fixed-datatype design, inefficient spatial or temporal resource sharing, and poor support for mixed-precision execution. These limitations collectively lead to under-utilization of DSP resources, limiting achievable parallelism and throughput. In this work, we present XtraMAC, a novel MAC architecture that unifies integer, floating-point, and mixed-precision operations within a single, datatype-adaptive microarchitecture. XtraMAC decomposes all supported MAC formats into a shared integer mantissa product with lightweight sign and exponent handling, enabling dynamic operand packing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
