LittleBit: Ultra Low-Bit Quantization via Latent Factorization

Banseok Lee; Dongkyu Kim; Youngcheon You; Youngmin Kim

arXiv:2506.13771·cs.LG·February 6, 2026

LittleBit: Ultra Low-Bit Quantization via Latent Factorization

Banseok Lee, Dongkyu Kim, Youngcheon You, Youngmin Kim

PDF

Open Access 1 Video

TL;DR

LittleBit introduces an ultra low-bit quantization framework for large language models, achieving significant compression and speedup while maintaining high performance through latent factorization and compensation mechanisms.

Contribution

The paper presents a novel quantization method targeting 0.1 bits per weight, combining latent factorization with multi-scale compensation and new training techniques for extreme model compression.

Findings

01

Achieves 31x memory reduction, compressing Llama2-13B to under 0.9 GB.

02

Outperforms existing methods at 0.7 BPW with 0.1 BPW on Llama2-7B.

03

Unlocks 11.6x inference speedup over FP16.

Abstract

The deployment of large language models (LLMs) is frequently hindered by prohibitive memory and computational requirements. While quantization mitigates these bottlenecks, maintaining model fidelity in the sub-1-bit regime remains a persistent challenge. In this paper, we introduce LittleBit, a novel framework for extreme LLM compression. We target quantization rates as low as $0.1$ bits per weight (BPW), achieving a memory reduction of approximately $31 \times$ , which effectively compresses Llama2-13B to under $0.9$ GB. We represent weights via low-rank latent matrix factorization and subsequently binarize the resulting factors. To counteract the information loss inherent to such drastic precision reduction, we integrate a multi-scale compensation mechanism that learns importance parameters across row, column, and latent dimensions. Two primary contributions enable effective training:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

LittleBit: Ultra Low-Bit Quantization via Latent Factorization· slideslive

Taxonomy

TopicsAdvanced Data Compression Techniques · Image Processing Techniques and Applications