CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization
Jinhao Zhang, Yunquan Zhang, Daning Chen, JunSun, Zicheng Yan

TL;DR
CALM is a novel framework that uses CKA to adaptively select the best quantization strategy for each layer in large language models, significantly improving performance over uniform and existing mixed-precision methods.
Contribution
It introduces a CKA-guided, layer-wise modularization approach for heterogeneous quantization, enabling automatic, fine-tuning-free optimization of quantization strategies per layer.
Findings
Outperforms uniform quantization baselines in LLMs.
Achieves better perplexity and downstream task results.
Works effectively on models like LLaMA and Qwen.
Abstract
Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMsincluding LLaMA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Data Compression Techniques · Natural Language Processing Techniques
