Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

Zhaoyang Wang; Dong Wang

arXiv:2511.05898·cs.CV·February 27, 2026

Q$^2$: Quantization-Aware Gradient Balancing and Attention Alignment for Low-Bit Quantization

Zhaoyang Wang, Dong Wang

PDF

Open Access

TL;DR

This paper introduces Q$^2$, a framework that improves low-bit quantization for complex visual tasks by addressing gradient imbalance and attention alignment, leading to better performance without extra inference costs.

Contribution

The paper proposes Q$^2$, a novel approach combining gradient balancing and attention alignment to enhance low-bit quantization in complex vision tasks.

Findings

01

+2.5% mAP on object detection

02

+3.7% mDICE on image segmentation

03

No inference overhead

Abstract

Quantization-aware training (QAT) has achieved remarkable success in low-bit ( $\leq$ 4-bit) quantization for classification networks. However, when applied to more complex visual tasks such as object detection and image segmentation, performance still suffers significant degradation. A key cause of this limitation has been largely overlooked in the literature. In this work, we revisit this phenomenon from a new perspective and identify a major failure factor: gradient imbalance at feature fusion stages, induced by accumulated quantization errors. This imbalance biases the optimization trajectory and impedes convergence under low-bit quantization. Based on this diagnosis, we propose Q $^{2}$ , a two-pronged framework comprising: (1) Quantization-aware Gradient Balancing Fusion (Q-GBFusion), a closed-loop mechanism that dynamically rebalances gradient contributions during feature fusion; and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Data Compression Techniques