Parameter-Efficient Semantic Augmentation for Enhancing Open-Vocabulary Object Detection
Weihao Cao, Runqi Wang, Xiaoyue Duan, Jinchao Zhang, Ang Yang, Liping Jing

TL;DR
This paper introduces HSA-DINO, a parameter-efficient semantic augmentation framework that improves open-vocabulary object detection across diverse domains by leveraging hierarchical semantics and dynamic augmentation strategies.
Contribution
The paper proposes a novel multi-scale prompt bank and semantic-aware router to enhance domain adaptation and generalization in open-vocabulary object detection.
Findings
HSA-DINO outperforms previous methods on OV-COCO and domain-specific datasets.
The framework achieves a better balance between domain adaptability and open-vocabulary generalization.
Semantic augmentation improves detection performance in domain-shift scenarios.
Abstract
Open-vocabulary object detection (OVOD) enables models to detect any object category, including unseen ones. Benefiting from large-scale pre-training, existing OVOD methods achieve strong detection performance on general scenarios (e.g., OV-COCO) but suffer severe performance drops when transferred to downstream tasks with substantial domain shifts. This degradation stems from the scarcity and weak semantics of category labels in domain-specific task, as well as the inability of existing models to capture auxiliary semantics beyond coarse-grained category label. To address these issues, we propose HSA-DINO, a parameter-efficient semantic augmentation framework for enhancing open-vocabulary object detection. Specifically, we propose a multi-scale prompt bank that leverages image feature pyramids to capture hierarchical semantics and select domain-specific local semantic prompts,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
