VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection
Jianhang Yao, Yongbin Zheng, Siqi Lu, Wanying Xu, Peng Sun

TL;DR
VK-Det is a novel framework for open-vocabulary aerial object detection that leverages visual knowledge and prototype-based pseudo-labeling to improve detection of unseen categories without extra supervision.
Contribution
It introduces a visual knowledge-guided approach with a prototype-aware pseudo-labeling strategy, enabling better open-vocabulary detection without additional supervision.
Findings
Achieves state-of-the-art mAP on DIOR and DOTA datasets.
Outperforms supervised methods in open-vocabulary detection.
Utilizes intrinsic visual features for fine-grained localization.
Abstract
To identify objects beyond predefined categories, open-vocabulary aerial object detection (OVAD) leverages the zero-shot capabilities of visual-language models (VLMs) to generalize from base to novel categories. Existing approaches typically utilize self-learning mechanisms with weak text supervision to generate region-level pseudo-labels to align detectors with VLMs semantic spaces. However, text dependence induces semantic bias, restricting open-vocabulary expansion to text-specified concepts. We propose , a isual nowledge-guided open-vocabulary object ection framework extra supervision. First, we discover and leverage vision encoder's inherent informative region perception to attain fine-grained localization and adaptive distillation. Second, we introduce a novel prototype-aware pseudo-labeling strategy. It…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning
