BitNet a4.8: 4-bit Activations for 1-bit LLMs
Hongyu Wang, Shuming Ma, Furu Wei

TL;DR
BitNet a4.8 introduces 4-bit activations and sparsification techniques for 1-bit LLMs, significantly improving inference speed and efficiency while maintaining performance comparable to previous models.
Contribution
This work presents a novel hybrid quantization and sparsification method enabling 4-bit activations in 1-bit LLMs, reducing inference costs and supporting faster, more efficient deployment.
Findings
Achieves comparable performance to BitNet b1.58
Enables faster inference with 4-bit kernels
Activates only 55% of parameters and supports 3-bit KV cache
Abstract
Recent research on the 1-bit Large Language Models (LLMs), such as BitNet b1.58, presents a promising direction for reducing the inference cost of LLMs while maintaining their performance. In this work, we introduce BitNet a4.8, enabling 4-bit activations for 1-bit LLMs. BitNet a4.8 employs a hybrid quantization and sparsification strategy to mitigate the quantization errors introduced by the outlier channels. Specifically, we utilize 4-bit activations for inputs to the attention and feed-forward network layers, while sparsifying intermediate states followed with 8-bit quantization. Extensive experiments demonstrate that BitNet a4.8 achieves performance comparable to BitNet b1.58 with equivalent training costs, while being faster in inference with enabling 4-bit (INT4/FP4) kernels. Additionally, BitNet a4.8 activates only 55% of parameters and supports 3-bit KV cache, further enhancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvancements in Semiconductor Devices and Circuit Design · Semiconductor materials and devices · VLSI and Analog Circuit Testing
MethodsSoftmax · Attention Is All You Need
