BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration

Yuzong Chen; Ahmed F. AbouElhamayed; Xilai Dai; Yang Wang; Marta; Andronic; George A. Constantinides; Mohamed S. Abdelfattah

arXiv:2411.11745·cs.LG·April 29, 2025

BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration

Yuzong Chen, Ahmed F. AbouElhamayed, Xilai Dai, Yang Wang, Marta, Andronic, George A. Constantinides, Mohamed S. Abdelfattah

PDF

Open Access 1 Repo

TL;DR

BitMoD introduces a co-designed algorithm-hardware approach that enables low-precision quantization of LLM weights and efficient acceleration, significantly improving speed and maintaining high accuracy across various tasks.

Contribution

The paper presents a novel data type adaptation and a bit-serial hardware design for low-precision LLM acceleration, achieving superior performance and accuracy.

Findings

01

Quantizes LLM weights to 4 bits with <0.5% accuracy loss on discriminative tasks.

02

Quantizes LLM weights to 3 bits with better perplexity on generative tasks.

03

Achieves 1.69x and 1.48x speedups over prior accelerators ANT and OliVe.

Abstract

Large language models (LLMs) have demonstrated remarkable performance across various machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders their deployment. In this paper, we improve the accessibility of LLMs through BitMoD, an algorithm-hardware co-design solution that enables efficient LLM acceleration at low weight precision. On the algorithm side, BitMoD introduces fine-grained data type adaptation that uses a different numerical data type to quantize a group of (e.g., 128) weights. Through the careful design of these new data types, BitMoD is able to quantize LLM weights to very low precision (e.g., 4 bits and 3 bits) while maintaining high accuracy. On the hardware side, BitMoD employs a bit-serial processing element to easily support multiple numerical precisions and data types; our hardware design includes two key innovations: First, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yc2367/bitmod-hpca-25
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Neural Networks and Applications · Algorithms and Data Compression