Generalizable Mixed-Precision Quantization via Attribution Rank Preservation
Ziwei Wang, Han Xiao, Jiwen Lu, Jie Zhou

TL;DR
This paper introduces GMPQ, a mixed-precision quantization method that generalizes across datasets with minimal data, reducing search costs while maintaining high accuracy.
Contribution
We propose a novel attribution rank preservation approach for efficient, dataset-agnostic mixed-precision quantization with significantly reduced search costs.
Findings
Achieves competitive accuracy with lower complexity.
Reduces search cost compared to state-of-the-art methods.
Maintains attribution rank consistency across models.
Abstract
In this paper, we propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference. Conventional methods require the consistency of datasets for bitwidth search and model deployment to guarantee the policy optimality, leading to heavy search cost on challenging largescale datasets in realistic applications. On the contrary, our GMPQ searches the mixed-quantization policy that can be generalized to largescale datasets with only a small amount of data, so that the search cost is significantly reduced without performance degradation. Specifically, we observe that locating network attribution correctly is general ability for accurate visual analysis across different data distribution. Therefore, despite of pursuing higher model accuracy and complexity, we preserve attribution rank consistency between the quantized models and their full-precision counterparts via…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Remote-Sensing Image Classification
