Language-Driven Active Learning for Diverse Open-Set 3D Object Detection
Ross Greer, Bj{\o}rk Antoniussen, Andreas M{\o}gelmose, Mohan Trivedi

TL;DR
This paper introduces VisLED, a language-driven active learning framework that enhances open-set 3D object detection by selecting diverse and informative samples, improving detection of novel and underrepresented objects in autonomous driving.
Contribution
The paper proposes the VisLED-Querying algorithm, which operates in open-world and closed-world settings to improve data sampling for 3D object detection models.
Findings
VisLED-Querying outperforms random sampling in efficiency.
It offers competitive results compared to entropy-based methods.
Demonstrates effectiveness on the nuScenes dataset.
Abstract
Object detection is crucial for ensuring safe autonomous driving. However, data-driven approaches face challenges when encountering minority or novel objects in the 3D driving scene. In this paper, we propose VisLED, a language-driven active learning framework for diverse open-set 3D Object Detection. Our method leverages active learning techniques to query diverse and informative data samples from an unlabeled pool, enhancing the model's ability to detect underrepresented or novel objects. Specifically, we introduce the Vision-Language Embedding Diversity Querying (VisLED-Querying) algorithm, which operates in both open-world exploring and closed-world mining settings. In open-world exploring, VisLED-Querying selects data points most novel relative to existing data, while in closed-world mining, it mines novel instances of known classes. We evaluate our approach on the nuScenes dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
