Parameter-Efficient Semantic Augmentation for Enhancing Open-Vocabulary Object Detection

Weihao Cao; Runqi Wang; Xiaoyue Duan; Jinchao Zhang; Ang Yang; Liping Jing

arXiv:2604.04444·cs.CV·April 7, 2026

Parameter-Efficient Semantic Augmentation for Enhancing Open-Vocabulary Object Detection

Weihao Cao, Runqi Wang, Xiaoyue Duan, Jinchao Zhang, Ang Yang, Liping Jing

PDF

TL;DR

This paper introduces HSA-DINO, a parameter-efficient semantic augmentation framework that improves open-vocabulary object detection across diverse domains by leveraging hierarchical semantics and dynamic augmentation strategies.

Contribution

The paper proposes a novel multi-scale prompt bank and semantic-aware router to enhance domain adaptation and generalization in open-vocabulary object detection.

Findings

01

HSA-DINO outperforms previous methods on OV-COCO and domain-specific datasets.

02

The framework achieves a better balance between domain adaptability and open-vocabulary generalization.

03

Semantic augmentation improves detection performance in domain-shift scenarios.

Abstract

Open-vocabulary object detection (OVOD) enables models to detect any object category, including unseen ones. Benefiting from large-scale pre-training, existing OVOD methods achieve strong detection performance on general scenarios (e.g., OV-COCO) but suffer severe performance drops when transferred to downstream tasks with substantial domain shifts. This degradation stems from the scarcity and weak semantics of category labels in domain-specific task, as well as the inability of existing models to capture auxiliary semantics beyond coarse-grained category label. To address these issues, we propose HSA-DINO, a parameter-efficient semantic augmentation framework for enhancing open-vocabulary object detection. Specifically, we propose a multi-scale prompt bank that leverages image feature pyramids to capture hierarchical semantics and select domain-specific local semantic prompts,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.