Vision-Language Guidance for LiDAR-based Unsupervised 3D Object   Detection

Christian Fruhwirth-Reisinger; Wei Lin; Du\v{s}an Mali\'c; Horst; Bischof; Horst Possegger

arXiv:2408.03790·cs.CV·August 8, 2024

Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection

Christian Fruhwirth-Reisinger, Wei Lin, Du\v{s}an Mali\'c, Horst, Bischof, Horst Possegger

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel unsupervised 3D object detection method using LiDAR data guided by vision-language models, which classifies static and moving objects without manual labels, outperforming previous methods on major datasets.

Contribution

The paper presents a vision-language-guided unsupervised detection approach that classifies LiDAR point clusters without relying on multiple drives or camera calibration, advancing the state-of-the-art.

Findings

01

Outperforms existing unsupervised detectors on Waymo and Argoverse 2 datasets.

02

Achieves +23 AP3D on Waymo and +7.9 AP3D on Argoverse 2.

03

Provides class labels without size-based assumptions.

Abstract

Accurate 3D object detection in LiDAR point clouds is crucial for autonomous driving systems. To achieve state-of-the-art performance, the supervised training of detectors requires large amounts of human-annotated data, which is expensive to obtain and restricted to predefined object categories. To mitigate manual labeling efforts, recent unsupervised object detection approaches generate class-agnostic pseudo-labels for moving objects, subsequently serving as supervision signal to bootstrap a detector. Despite promising results, these approaches do not provide class labels or generalize well to static objects. Furthermore, they are mostly restricted to data containing multiple drives from the same scene or images from a precisely calibrated and synchronized camera setup. To overcome these limitations, we propose a vision-language-guided unsupervised 3D detection approach that operates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chreisinger/ViLGOD
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Automated Systems

MethodsContrastive Language-Image Pre-training