BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks
Jacob Nielsen, Peter Schneider-Kamp

TL;DR
This paper demonstrates that 1.58-bit quantization-aware training achieves state-of-the-art results on small language and vision models, making it a promising method for resource-constrained deployment.
Contribution
It introduces a median-based variant of BitNet b1.58 and extensively evaluates its performance on small models, extending the applicability of 1.58-bit quantization.
Findings
1.58-bit models outperform previous quantization methods on small models.
Robustness patterns differ between small and large models.
State-of-the-art performance achieved on small vision and language models.
Abstract
Recently proposed methods for 1-bit and 1.58-bit quantization aware training investigate the performance and behavior of these methods in the context of large language models, finding state-of-the-art performance for models with more than 3B parameters. In this work, we investigate 1.58-bit quantization for small language and vision models ranging from 100K to 48M parameters. We introduce a variant of BitNet b1.58, which allows to rely on the median rather than the mean in the quantization process. Through extensive experiments we investigate the performance of 1.58-bit models obtained through quantization aware training. We further investigate the robustness of 1.58-bit quantization-aware training to changes in the learning rate and regularization through weight decay, finding different patterns for small language and vision models than previously reported for large language models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBrain Tumor Detection and Classification · IoT and Edge/Fog Computing
MethodsAttentive Walk-Aggregating Graph Neural Network
