BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller   Networks

Jacob Nielsen; Peter Schneider-Kamp

arXiv:2407.09527·cs.CV·July 16, 2024

BitNet b1.58 Reloaded: State-of-the-art Performance Also on Smaller Networks

Jacob Nielsen, Peter Schneider-Kamp

PDF

Open Access

TL;DR

This paper demonstrates that 1.58-bit quantization-aware training achieves state-of-the-art results on small language and vision models, making it a promising method for resource-constrained deployment.

Contribution

It introduces a median-based variant of BitNet b1.58 and extensively evaluates its performance on small models, extending the applicability of 1.58-bit quantization.

Findings

01

1.58-bit models outperform previous quantization methods on small models.

02

Robustness patterns differ between small and large models.

03

State-of-the-art performance achieved on small vision and language models.

Abstract

Recently proposed methods for 1-bit and 1.58-bit quantization aware training investigate the performance and behavior of these methods in the context of large language models, finding state-of-the-art performance for models with more than 3B parameters. In this work, we investigate 1.58-bit quantization for small language and vision models ranging from 100K to 48M parameters. We introduce a variant of BitNet b1.58, which allows to rely on the median rather than the mean in the quantization process. Through extensive experiments we investigate the performance of 1.58-bit models obtained through quantization aware training. We further investigate the robustness of 1.58-bit quantization-aware training to changes in the learning rate and regularization through weight decay, finding different patterns for small language and vision models than previously reported for large language models.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBrain Tumor Detection and Classification · IoT and Edge/Fog Computing

MethodsAttentive Walk-Aggregating Graph Neural Network