CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

Jinhao Zhang; Yunquan Zhang; Daning Chen; JunSun; Zicheng Yan

arXiv:2512.16282·cs.LG·January 9, 2026

CALM: A CKA-Guided Adaptive Layer-Wise Modularization Framework for LLM Quantization

Jinhao Zhang, Yunquan Zhang, Daning Chen, JunSun, Zicheng Yan

PDF

Open Access

TL;DR

CALM is a novel framework that uses CKA to adaptively select the best quantization strategy for each layer in large language models, significantly improving performance over uniform and existing mixed-precision methods.

Contribution

It introduces a CKA-guided, layer-wise modularization approach for heterogeneous quantization, enabling automatic, fine-tuning-free optimization of quantization strategies per layer.

Findings

01

Outperforms uniform quantization baselines in LLMs.

02

Achieves better perplexity and downstream task results.

03

Works effectively on models like LLaMA and Qwen.

Abstract

Current mainstream post-training quantization methods for large language models typically apply a uniform quantization strategy across all network layers, overlooking the substantial differences in algorithmic suitability among layers. To address this limitation, we propose CALM (A CKA-guided Adaptive Layer-wise Modularization)a fine-tuning-free, plug-and-play framework for algorithmic heterogeneous quantization. CALM independently evaluates multiple PTQ algorithms on each layer and employs Linear Centered Kernel Alignment (CKA) as a metric to automatically select the optimal quantization strategy per layer. The individually optimized strategies are then integrated to construct a hybrid quantized model. Experiments demonstrate that our approach consistently outperforms both uniform quantization baselines and state-of-the-art mixed-precision methods across mainstream LLMsincluding LLaMA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Data Compression Techniques · Natural Language Processing Techniques