InfoQ: Mixed-Precision Quantization via Global Information Flow
Mehmet Emre Akbulut, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri

TL;DR
This paper introduces InfoQ, a training-free mixed-precision quantization method that uses global information flow analysis to efficiently allocate bit-widths, achieving high compression with minimal accuracy loss.
Contribution
InfoQ proposes a novel, training-free framework that assesses layer sensitivity via mutual information changes, enabling efficient global optimization for mixed-precision quantization.
Findings
Achieves up to 1% accuracy improvement at high compression rates
Uses two orders of magnitude less data than previous methods
Provides superior search-time/accuracy trade-off
Abstract
Mixed-precision quantization (MPQ) is crucial for deploying deep neural networks on resource-constrained devices, but finding the optimal bit-width for each layer represents a complex combinatorial optimization problem. Current state-of-the-art methods rely on computationally expensive search algorithms or local sensitivity heuristic proxies like the Hessian, which fail to capture the cascading global effects of quantization error. In this work, we argue that the quantization sensitivity of a layer should not be measured by its local properties, but by its impact on the information flow throughout the entire network. We introduce InfoQ, a novel framework for MPQ that is training-free in the bit-width search phase. InfoQ assesses layer sensitivity by quantizing each layer at different bit-widths and measuring, through a single forward pass, the resulting change in mutual information in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
