Statistically-Lossless Quantization of Large Language Models

Michael Helcig; Eldar Kurtic; Dan Alistarh

arXiv:2605.02404·cs.LG·May 5, 2026

Statistically-Lossless Quantization of Large Language Models

Michael Helcig, Eldar Kurtic, Dan Alistarh

PDF

1 Repo

TL;DR

This paper introduces a statistically-lossless quantization method for large language models that balances fidelity and efficiency, achieving significant compression and speedups while maintaining accuracy.

Contribution

It formalizes notions of task-lossless and distribution-lossless compression, proposes the EAR metric, and develops SLQ, a novel asymmetric quantization technique with wide bitwidth search.

Findings

01

Task-lossless compression achieved below 4 bits per parameter.

02

Distribution-lossless compression achieved at 5-6 bits per parameter.

03

Inference speedups of 1.7 to 3.6 times over FP16.

Abstract

Model quantization has become essential for efficient large language model deployment, yet existing approaches involve clear trade-offs: methods such as GPTQ and AWQ achieve practical compression but are lossy, while lossless techniques preserve fidelity but typically do not accelerate inference. This paper explores the middle ground of statistically-lossless compression through three complementary notions of losslessness for quantized LLMs. First, task-lossless compression preserves zero-shot benchmark accuracy within natural sampling variance and remains achievable at aggressive bitwidths. Second, we formalize the stricter notion of distribution-lossless compression, requiring the quantized model's next-token distribution to be practically indistinguishable from the original, and propose the Expected Acceptance Rate (EAR), the maximum token-agreement probability under optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

IST-DASLab/SLQ
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.