Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization
Zhaoyang Wang, Dong Wang

TL;DR
This paper introduces Q$^2$, a framework that improves low-bit quantization for complex visual tasks by addressing gradient imbalance and attention alignment, leading to better performance without extra inference costs.
Contribution
The paper proposes Q$^2$, a novel approach combining gradient balancing and attention alignment to enhance low-bit quantization in complex vision tasks.
Findings
+2.5% mAP on object detection
+3.7% mDICE on image segmentation
No inference overhead
Abstract
Quantization-aware training (QAT) has achieved remarkable success in low-bit (4-bit) quantization for classification networks. However, when applied to more complex visual tasks such as object detection and image segmentation, performance still suffers significant degradation. A key cause of this limitation has been largely overlooked in the literature. In this work, we revisit this phenomenon from a new perspective and identify a major failure factor: gradient imbalance at feature fusion stages, induced by accumulated quantization errors. This imbalance biases the optimization trajectory and impedes convergence under low-bit quantization. Based on this diagnosis, we propose Q, a two-pronged framework comprising: (1) Quantization-aware Gradient Balancing Fusion (Q-GBFusion), a closed-loop mechanism that dynamically rebalances gradient contributions during feature fusion; and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Data Compression Techniques
