BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
Yuzong Chen, Ahmed F. AbouElhamayed, Xilai Dai, Yang Wang, Marta, Andronic, George A. Constantinides, Mohamed S. Abdelfattah

TL;DR
BitMoD introduces a co-designed algorithm-hardware approach that enables low-precision quantization of LLM weights and efficient acceleration, significantly improving speed and maintaining high accuracy across various tasks.
Contribution
The paper presents a novel data type adaptation and a bit-serial hardware design for low-precision LLM acceleration, achieving superior performance and accuracy.
Findings
Quantizes LLM weights to 4 bits with <0.5% accuracy loss on discriminative tasks.
Quantizes LLM weights to 3 bits with better perplexity on generative tasks.
Achieves 1.69x and 1.48x speedups over prior accelerators ANT and OliVe.
Abstract
Large language models (LLMs) have demonstrated remarkable performance across various machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders their deployment. In this paper, we improve the accessibility of LLMs through BitMoD, an algorithm-hardware co-design solution that enables efficient LLM acceleration at low weight precision. On the algorithm side, BitMoD introduces fine-grained data type adaptation that uses a different numerical data type to quantize a group of (e.g., 128) weights. Through the careful design of these new data types, BitMoD is able to quantize LLM weights to very low precision (e.g., 4 bits and 3 bits) while maintaining high accuracy. On the hardware side, BitMoD employs a bit-serial processing element to easily support multiple numerical precisions and data types; our hardware design includes two key innovations: First, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Neural Networks and Applications · Algorithms and Data Compression
