On the Importance of a Multi-Scale Calibration for Quantization
Seungwoo Son, Ingyu Seong, Junhan Kim, Hyemi Jang, Yongkweon Jeon

TL;DR
This paper introduces MaCa, a multi-scale calibration method that improves post-training quantization of large language models by accounting for variable input lengths, leading to more accurate Hessian estimates and better model performance.
Contribution
MaCa is the first method to incorporate multi-scale sequence length information into Hessian estimation for LLM quantization, enhancing accuracy in low-bit settings.
Findings
MaCa consistently improves quantization accuracy on LLMs like Qwen3, Gemma3, and LLaMA3.
MaCa offers a lightweight, compatible enhancement for existing PTQ frameworks.
Multi-scale calibration significantly impacts Hessian-based weight importance estimation.
Abstract
Post-training quantization (PTQ) is a cornerstone for efficiently deploying large language models (LLMs), where a small calibration set critically affects quantization performance. However, conventional practices rely on random sequences of fixed length, overlooking the variable-length nature of LLM inputs. Input length directly influences the activation distribution and, consequently, the weight importance captured by the Hessian, which in turn affects quantization outcomes. As a result, Hessian estimates derived from fixed-length calibration may fail to represent the true importance of weights across diverse input scenarios. We propose MaCa (Matryoshka Calibration), a simple yet effective method for length-aware Hessian construction. MaCa (i) incorporates multi-scale sequence length information into Hessian estimation and (ii) regularizes each sequence as an independent sample,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Advanced Neural Network Applications
