VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection

Jianhang Yao; Yongbin Zheng; Siqi Lu; Wanying Xu; Peng Sun

arXiv:2511.18075·cs.CV·November 25, 2025

VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection

Jianhang Yao, Yongbin Zheng, Siqi Lu, Wanying Xu, Peng Sun

PDF

Open Access 1 Video

TL;DR

VK-Det is a novel framework for open-vocabulary aerial object detection that leverages visual knowledge and prototype-based pseudo-labeling to improve detection of unseen categories without extra supervision.

Contribution

It introduces a visual knowledge-guided approach with a prototype-aware pseudo-labeling strategy, enabling better open-vocabulary detection without additional supervision.

Findings

01

Achieves state-of-the-art mAP on DIOR and DOTA datasets.

02

Outperforms supervised methods in open-vocabulary detection.

03

Utilizes intrinsic visual features for fine-grained localization.

Abstract

To identify objects beyond predefined categories, open-vocabulary aerial object detection (OVAD) leverages the zero-shot capabilities of visual-language models (VLMs) to generalize from base to novel categories. Existing approaches typically utilize self-learning mechanisms with weak text supervision to generate region-level pseudo-labels to align detectors with VLMs semantic spaces. However, text dependence induces semantic bias, restricting open-vocabulary expansion to text-specified concepts. We propose $VK-Det$ , a $V$ isual $K$ nowledge-guided open-vocabulary object $Det$ ection framework $without$ extra supervision. First, we discover and leverage vision encoder's inherent informative region perception to attain fine-grained localization and adaptive distillation. Second, we introduce a novel prototype-aware pseudo-labeling strategy. It…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

VK-Det: Visual Knowledge Guided Prototype Learning for Open-Vocabulary Aerial Object Detection· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Domain Adaptation and Few-Shot Learning