CR-QAT: Curriculum Relational Quantization-Aware Training for Open-Vocabulary Object Detection
Jinyeong Park, Donghwa Kang, Brent ByungHoon Kang, Hyeongboo Baek, Jibum Kim

TL;DR
This paper introduces CR-QAT, a novel training framework that combines curriculum-based quantization and relational knowledge distillation to effectively compress open-vocabulary object detection models without sacrificing fine-grained vision-language alignment.
Contribution
CR-QAT is the first integrated approach that addresses quantization errors in open-vocabulary detection by progressive model partitioning and relational knowledge transfer.
Findings
CR-QAT outperforms existing QAT methods under low-bit settings.
Achieves up to 38.9% and 40.9% relative AP improvements on LVIS and COCO.
Effectively preserves vision-language alignment during aggressive quantization.
Abstract
Open-vocabulary object detection (OVOD) enables novel category detection via vision-language alignment, but massive model sizes hinder deployment on resource-constrained devices. While quantization offers practical compression, we reveal that naive extreme low-bit (e.g., 4-bit) quantization severely degrades fine-grained vision-language alignment and distorts inter-region relational structures. To address this, we propose curriculum relational quantization-aware training (CR-QAT), an integrated framework combining stage-by-stage optimization with relational knowledge distillation. Within CR-QAT, curriculum QAT (CQAT) mitigates error accumulation by partitioning the model for progressive quantization, ensuring stable optimization via error isolation. Concurrently, text-centric relational KD (TRKD) is applied to task-relevant modules. By constructing text-anchored pairwise similarity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
