MixQuant: Pushing the Limits of Block Rotations in Post-Training Quantization
Sai Sanjeet, Ian Colbert, Pablo Monteagudo-Lago, Giuseppe Franco, Yaman Umuroglu, Nicholas J. Fraser

TL;DR
MixQuant introduces a permutation-based post-training quantization method that enhances outlier suppression in block rotations, significantly improving accuracy in low-bit quantized models like Llama3 1B.
Contribution
The paper provides the first systematic analysis of block Hadamard rotations in PTQ and proposes MixQuant, a novel permutation-aware framework that redistributes activation mass for better outlier suppression.
Findings
MixQuant recovers up to 90% of full-vector rotation perplexity in quantized Llama3 1B.
Permutation-based mass diffusion improves accuracy across all block sizes.
The method reduces inference overhead by merging permutations into model weights.
Abstract
Recent post-training quantization (PTQ) methods have adopted block rotations to diffuse outliers prior to rounding. While this reduces the overhead of full-vector rotations, the effect of block structure on outlier suppression remains poorly understood. To fill this gap, we present the first systematic, non-asymptotic analysis of outlier suppression for block Hadamard rotations. Our analysis reveals that outlier suppression is fundamentally limited by the geometry of the input vector. In particular, post-rotation outliers are deterministically minimized when the pre-rotation norm mass is evenly distributed across blocks. Guided by these insights, we introduce MixQuant, a block rotation-aware PTQ framework that redistributes activation mass via permutations prior to rotation. We propose a greedy mass diffusion algorithm to calibrate permutations by equalizing the expected…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Image and Video Quality Assessment · Advanced Neural Network Applications
