Adaptive Distribution-aware Quantization for Mixed-Precision Neural Networks

Shaohang Jia; Zhiyong Huang; Zhi Yu; Mingyang Hou; Shuai Miao; Han Yang

arXiv:2510.19760·cs.CV·October 23, 2025

Adaptive Distribution-aware Quantization for Mixed-Precision Neural Networks

Shaohang Jia, Zhiyong Huang, Zhi Yu, Mingyang Hou, Shuai Miao, Han Yang

PDF

Open Access

TL;DR

This paper introduces ADQ, a novel mixed-precision quantization framework that dynamically adapts codebooks and allocates precision based on distributional insights, significantly improving neural network deployment on resource-limited devices.

Contribution

The paper presents a new adaptive weight quantization scheme with quantile-based initialization, online codebook adaptation, and sensitivity-aware mixed-precision allocation, addressing distribution and static codebook issues in QAT.

Findings

01

Achieves 71.512% Top-1 accuracy on ImageNet with 2.81 bits for ResNet-18.

02

Outperforms state-of-the-art methods under similar resource constraints.

03

Ablation studies confirm the effectiveness of each proposed component.

Abstract

Quantization-Aware Training (QAT) is a critical technique for deploying deep neural networks on resource-constrained devices. However, existing methods often face two major challenges: the highly non-uniform distribution of activations and the static, mismatched codebooks used in weight quantization. To address these challenges, we propose Adaptive Distribution-aware Quantization (ADQ), a mixed-precision quantization framework that employs a differentiated strategy. The core of ADQ is a novel adaptive weight quantization scheme comprising three key innovations: (1) a quantile-based initialization method that constructs a codebook closely aligned with the initial weight distribution; (2) an online codebook adaptation mechanism based on Exponential Moving Average (EMA) to dynamically track distributional shifts; and (3) a sensitivity-informed strategy for mixed-precision allocation. For…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Domain Adaptation and Few-Shot Learning