GAQAT: gradient-adaptive quantization-aware training for domain generalization
Jiacheng Jiang, Yuan Meng, Chen Tang, Han Yu, Qun Li, Zhi Wang, Wenwu, Zhu

TL;DR
This paper introduces GAQAT, a novel low-precision training method that stabilizes quantized models for better domain generalization by addressing gradient conflicts in quantization scales.
Contribution
The paper proposes a gradient-adaptive quantization-aware training framework that stabilizes low-precision models for domain generalization, addressing gradient conflicts in quantization scales.
Findings
GAQAT improves 3-bit and 4-bit models' performance on PACS by up to 4.5%.
4-bit GAQAT achieves near-lossless performance on DomainNet, surpassing SOTA QAT.
Stabilizing gradient conflicts enhances low-precision models' out-of-domain generalization.
Abstract
Research on loss surface geometry, such as Sharpness-Aware Minimization (SAM), shows that flatter minima improve generalization. Recent studies further reveal that flatter minima can also reduce the domain generalization (DG) gap. However, existing flatness-based DG techniques predominantly operate within a full-precision training process, which is impractical for deployment on resource-constrained edge devices that typically rely on lower bit-width representations (e.g., 4 bits, 3 bits). Consequently, low-precision quantization-aware training is critical for optimizing these techniques in real-world applications. In this paper, we observe a significant degradation in performance when applying state-of-the-art DG-SAM methods to quantized models, suggesting that current approaches fail to preserve generalizability during the low-precision training process. To address this limitation, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning
MethodsSharpness-Aware Minimization
