Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models

He Xiao; Qingyao Yang; Dirui Xie; Wendong Xu; Zunhai Su; Runming yang; Wenyong Zhou; Haobo Liu; Zhengwu Liu; Ngai Wong

arXiv:2508.03332·cs.LG·December 30, 2025

Exploring Layer-wise Information Effectiveness for Post-Training Quantization in Small Language Models

He Xiao, Qingyao Yang, Dirui Xie, Wendong Xu, Zunhai Su, Runming yang, Wenyong Zhou, Haobo Liu, Zhengwu Liu, Ngai Wong

PDF

TL;DR

This paper introduces LieQ, a layer-wise, geometry-driven post-training quantization method for small language models that maintains accuracy at ultra-low bit-widths while preserving hardware efficiency.

Contribution

LieQ presents a novel, metric-driven quantization framework that automatically allocates bit-widths based on layer importance, improving accuracy and efficiency for models under 8B parameters.

Findings

01

LieQ reduces accuracy gap at 2-bit quantization on Qwen3 and LLaMA3.x models.

02

It preserves standard kernel operations, enabling efficient deployment on edge devices.

03

Layer-wise functional saliency correlates with representational compactness, guiding bit-width allocation.

Abstract

Large language models with billions of parameters are often over-provisioned: many layers contribute little unique information yet dominate the memory and energy footprint during inference. We present LieQ Layer-wise information effectiveness Quantization, a hardware-native, metric-driven post-training quantization framework that addresses the critical challenge of maintaining accuracy in sub-8B models, model parameters less than 8B, under extreme low-bit compression. LieQ keeps uniform bit-width within each layer while mixing precision across layers, preserving standard multiplication kernels and avoiding irregular memory access, codebooks, or irregular formats at inference time. Our method uncovers a strong correlation between layer-wise functional saliency and representational compactness, revealing that layers with higher training-induced energy concentration are functionally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.