Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization

Ian W. Kennedy; Nafise Sadat Moosavi

arXiv:2604.08118·cs.CL·April 10, 2026

Initialisation Determines the Basin: Efficient Codebook Optimisation for Extreme LLM Quantization

Ian W. Kennedy, Nafise Sadat Moosavi

PDF

1 Repo 3 Models

TL;DR

This paper introduces an output-aware EM initialisation method for extreme LLM quantization, significantly improving codebook optimisation and model performance at low bit precision.

Contribution

It identifies codebook initialisation as a key bottleneck and proposes OA-EM, a novel initialisation technique that enhances quantization quality across various models and compression rates.

Findings

01

OA-EM outperforms existing initialisation methods after PV-tuning.

02

Better initialisation leads to improved perplexity, especially at 2-bit precision.

03

The severity of initialisation issues scales with the representational ratio ho.

Abstract

Additive quantization enables extreme LLM compression with O(1) lookup-table dequantization, making it attractive for edge deployment. Yet at 2-bit precision, it often fails catastrophically, even with extensive search and finetuning. We show that the dominant bottleneck is codebook initialisation. Greedy sequential initialisation frequently places the model in poor optimisation regions that subsequent beam search and PV-tuning struggle to overcome. We analyse this behaviour through the representational ratio \r{ho} = N/KM, which characterises the relationship between weight groups and codebook capacity, and propose OA-EM, an output-aware EM initialisation method using Hessian-weighted Mahalanobis distance. Across compression rates, search budgets, and three architectures (Llama 3.2 3B, Llama 3.1 8B, Qwen 2.5 3B), OA-EM consistently produces better solutions after PV-tuning and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kenno94-ik/aqlm-oaem
github

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.