Adaptive Distribution-aware Quantization for Mixed-Precision Neural Networks
Shaohang Jia, Zhiyong Huang, Zhi Yu, Mingyang Hou, Shuai Miao, Han Yang

TL;DR
This paper introduces ADQ, a novel mixed-precision quantization framework that dynamically adapts codebooks and allocates precision based on distributional insights, significantly improving neural network deployment on resource-limited devices.
Contribution
The paper presents a new adaptive weight quantization scheme with quantile-based initialization, online codebook adaptation, and sensitivity-aware mixed-precision allocation, addressing distribution and static codebook issues in QAT.
Findings
Achieves 71.512% Top-1 accuracy on ImageNet with 2.81 bits for ResNet-18.
Outperforms state-of-the-art methods under similar resource constraints.
Ablation studies confirm the effectiveness of each proposed component.
Abstract
Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and the static, mismatched codebooks used in weight quantization. To address these challenges, we propose Adaptive Distribution-aware Quantization (ADQ), a mixed-precision quantization framework that employs a differentiated strategy. The core of ADQ is a novel adaptive weight quantization scheme comprising three key innovations: (1) a quantile-based initialization method that constructs a codebook closely aligned with the initial weight distribution; (2) an online codebook adaptation mechanism based on Exponential Moving Average (EMA) to dynamically track distributional shifts; and (3) a sensitivity-informed strategy for mixed-precision allocation. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Domain Adaptation and Few-Shot Learning
