TL;DR
This paper introduces FLAME, an active learning strategy for rapid on-the-fly adaptation of open-vocabulary object detection models to specialized domains like remote sensing, using minimal user annotations for high accuracy.
Contribution
It proposes a cascaded framework combining zero-shot detection with a lightweight, real-time trained classifier and introduces FLAME, an active learning method for efficient sample selection without full model fine-tuning.
Findings
Outperforms state-of-the-art on remote sensing benchmarks.
Achieves high accuracy with less than a minute of adaptation time.
Reduces annotation costs by using few-shot learning.
Abstract
Open-vocabulary object detection (OVD) models offer remarkable flexibility by detecting objects from arbitrary text queries. However, their zero-shot performance in specialized domains like Remote Sensing (RS) is often compromised by the inherent ambiguity of natural language, limiting critical downstream applications. For instance, an OVD model may struggle to distinguish between fine-grained classes such as "fishing boat" and "yacht" since their embeddings are similar and often inseparable. This can hamper specific user goals, such as monitoring illegal fishing, by producing irrelevant detections. To address this, we propose a cascaded approach that couples the broad generalization of a large pre-trained OVD model with a lightweight few-shot classifier. Our method first employs the zero-shot model to generate high-recall object proposals. These proposals are then refined for high…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
