InfoQ: Mixed-Precision Quantization via Global Information Flow

Mehmet Emre Akbulut; Hazem Hesham Yousef Shalby; Fabrizio Pittorino; Manuel Roveri

arXiv:2508.04753·cs.LG·March 24, 2026

InfoQ: Mixed-Precision Quantization via Global Information Flow

Mehmet Emre Akbulut, Hazem Hesham Yousef Shalby, Fabrizio Pittorino, Manuel Roveri

PDF

TL;DR

This paper introduces InfoQ, a training-free mixed-precision quantization method that uses global information flow analysis to efficiently allocate bit-widths, achieving high compression with minimal accuracy loss.

Contribution

InfoQ proposes a novel, training-free framework that assesses layer sensitivity via mutual information changes, enabling efficient global optimization for mixed-precision quantization.

Findings

01

Achieves up to 1% accuracy improvement at high compression rates

02

Uses two orders of magnitude less data than previous methods

03

Provides superior search-time/accuracy trade-off

Abstract

Mixed-precision quantization (MPQ) is crucial for deploying deep neural networks on resource-constrained devices, but finding the optimal bit-width for each layer represents a complex combinatorial optimization problem. Current state-of-the-art methods rely on computationally expensive search algorithms or local sensitivity heuristic proxies like the Hessian, which fail to capture the cascading global effects of quantization error. In this work, we argue that the quantization sensitivity of a layer should not be measured by its local properties, but by its impact on the information flow throughout the entire network. We introduce InfoQ, a novel framework for MPQ that is training-free in the bit-width search phase. InfoQ assesses layer sensitivity by quantizing each layer at different bit-widths and measuring, through a single forward pass, the resulting change in mutual information in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.