any4: Learned 4-bit Numeric Representation for LLMs

Mostafa Elhoushi; Jeff Johnson

arXiv:2507.04610·cs.LG·July 8, 2025

any4: Learned 4-bit Numeric Representation for LLMs

Mostafa Elhoushi, Jeff Johnson

PDF

1 Repo 1 Video

TL;DR

any4 introduces a learned 4-bit quantization method for LLMs that improves accuracy without pre-processing, is competitive with existing techniques, and requires minimal calibration data, with open-source GPU implementation.

Contribution

The paper presents any4, a novel learned 4-bit quantization technique for LLMs that outperforms existing methods and simplifies calibration, with open-source tools.

Findings

01

Higher accuracy than int4, fp4, nf4 on various models

02

Competitive with preprocessing-dependent methods like AWQ and GPTQ

03

Effective calibration with a single sample

Abstract

We present any4, a learned 4-bit weight quantization solution for large language models (LLMs) providing arbitrary numeric representations without requiring pre-processing of weights or activations. any4 yields higher accuracy compared to other related 4-bit numeric representation types: int4, fp4 and nf4, as evaluated on a range of model sizes, generations and families (Llama 2, Llama 3, Mistral and Mixtral). While any4 does not require preprocessing of weights or activations, it is also competitive with orthogonal techniques that require such preprocessing (e.g., AWQ and GPTQ). We also experiment with any3 and any2 and show competitiveness at lower bits. Additionally, we show that we can calibrate using a single curated diverse sample rather than hundreds of samples from a dataset as done in most quantization approaches. We also open source tinygemm, a latency optimized GPU matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

facebookresearch/any4
pytorchOfficial

Videos

any4: Learned 4-bit Numeric Representation for LLMs· slideslive

Taxonomy

MethodsLLaMA