Language-Driven Active Learning for Diverse Open-Set 3D Object Detection

Ross Greer; Bj{\o}rk Antoniussen; Andreas M{\o}gelmose; Mohan Trivedi

arXiv:2404.12856·cs.CV·June 19, 2024·1 cites

Language-Driven Active Learning for Diverse Open-Set 3D Object Detection

Ross Greer, Bj{\o}rk Antoniussen, Andreas M{\o}gelmose, Mohan Trivedi

PDF

Open Access 1 Repo

TL;DR

This paper introduces VisLED, a language-driven active learning framework that enhances open-set 3D object detection by selecting diverse and informative samples, improving detection of novel and underrepresented objects in autonomous driving.

Contribution

The paper proposes the VisLED-Querying algorithm, which operates in open-world and closed-world settings to improve data sampling for 3D object detection models.

Findings

01

VisLED-Querying outperforms random sampling in efficiency.

02

It offers competitive results compared to entropy-based methods.

03

Demonstrates effectiveness on the nuScenes dataset.

Abstract

Object detection is crucial for ensuring safe autonomous driving. However, data-driven approaches face challenges when encountering minority or novel objects in the 3D driving scene. In this paper, we propose VisLED, a language-driven active learning framework for diverse open-set 3D Object Detection. Our method leverages active learning techniques to query diverse and informative data samples from an unlabeled pool, enhancing the model's ability to detect underrepresented or novel objects. Specifically, we introduce the Vision-Language Embedding Diversity Querying (VisLED-Querying) algorithm, which operates in both open-world exploring and closed-world mining settings. In open-world exploring, VisLED-Querying selects data points most novel relative to existing data, while in closed-world mining, it mines novel instances of known classes. We evaluate our approach on the nuScenes dataset…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bjork-crypto/visled-querying
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications