Toward Open Vocabulary Aerial Object Detection with CLIP-Activated Student-Teacher Learning
Yan Li, Weiwei Guo, Xue Yang, Ning Liao, Dunyun He, Jiaqi Zhou,, Wenxian Yu

TL;DR
This paper introduces CastDet, a novel CLIP-activated student-teacher framework for open-vocabulary aerial object detection, significantly improving detection performance on unseen categories using rich knowledge transfer and pseudo-labeling strategies.
Contribution
It develops the first open-vocabulary detection method tailored for aerial images, leveraging CLIP and a dynamic pseudo-labeling approach within a student-teacher framework.
Findings
Achieves 46.5% mAP on VisDroneZSD novel categories.
Outperforms state-of-the-art open-vocabulary detectors by 21.0% mAP.
Demonstrates effectiveness of CLIP-activated self-learning for aerial object detection.
Abstract
An increasingly massive number of remote-sensing images spurs the development of extensible object detectors that can detect objects beyond training categories without costly collecting new labeled data. In this paper, we aim to develop open-vocabulary object detection (OVD) technique in aerial images that scales up object vocabulary size beyond training data. The performance of OVD greatly relies on the quality of class-agnostic region proposals and pseudo-labels for novel object categories. To simultaneously generate high-quality proposals and pseudo-labels, we propose CastDet, a CLIP-activated student-teacher open-vocabulary object Detection framework. Our end-to-end framework following the student-teacher self-learning mechanism employs the RemoteCLIP model as an extra omniscient teacher with rich knowledge. By doing so, our approach boosts not only novel object proposals but also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Multimodal Machine Learning Applications
MethodsSparse Evolutionary Training · Self-Learning · Contrastive Language-Image Pre-training
