1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization

Sohir Maskey; Constantin Eichenberg; Johannes Messner; Douglas Orr

arXiv:2602.15563·cs.LG·February 18, 2026

1-Bit Wonder: Improving QAT Performance in the Low-Bit Regime through K-Means Quantization

Sohir Maskey, Constantin Eichenberg, Johannes Messner, Douglas Orr

PDF

Open Access

TL;DR

This paper empirically studies low-bit quantization-aware training for large language models, demonstrating that 1-bit k-means quantization outperforms integer formats and yields optimal downstream performance within fixed memory budgets.

Contribution

It introduces a comprehensive empirical analysis of QAT in the low-bit regime, highlighting the effectiveness of k-means based 1-bit quantization for LLMs.

Findings

01

k-means based weight quantization outperforms integer formats

02

1-bit quantization achieves best downstream performance under fixed memory

03

efficient implementation on standard hardware

Abstract

Quantization-aware training (QAT) is an effective method to drastically reduce the memory footprint of LLMs while keeping performance degradation at an acceptable level. However, the optimal choice of quantization format and bit-width presents a challenge in practice. The full design space of quantization is not fully explored in the context of QAT, and the precise trade-off between quantization and downstream performance is poorly understood, as comparisons often rely solely on perplexity-based evaluations. In this work, we address these shortcomings with an empirical study of QAT in the low-bit regime. We show that k-means based weight quantization outperforms integer formats and can be implemented efficiently on standard hardware. Furthermore, we find that, under a fixed inference memory budget, the best performance on generative downstream tasks is achieved with $1$ -bit quantized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Parallel Computing and Optimization Techniques · Embedded Systems Design Techniques