Vision-Language Guidance for LiDAR-based Unsupervised 3D Object Detection
Christian Fruhwirth-Reisinger, Wei Lin, Du\v{s}an Mali\'c, Horst, Bischof, Horst Possegger

TL;DR
This paper introduces a novel unsupervised 3D object detection method using LiDAR data guided by vision-language models, which classifies static and moving objects without manual labels, outperforming previous methods on major datasets.
Contribution
The paper presents a vision-language-guided unsupervised detection approach that classifies LiDAR point clusters without relying on multiple drives or camera calibration, advancing the state-of-the-art.
Findings
Outperforms existing unsupervised detectors on Waymo and Argoverse 2 datasets.
Achieves +23 AP3D on Waymo and +7.9 AP3D on Argoverse 2.
Provides class labels without size-based assumptions.
Abstract
Accurate 3D object detection in LiDAR point clouds is crucial for autonomous driving systems. To achieve state-of-the-art performance, the supervised training of detectors requires large amounts of human-annotated data, which is expensive to obtain and restricted to predefined object categories. To mitigate manual labeling efforts, recent unsupervised object detection approaches generate class-agnostic pseudo-labels for moving objects, subsequently serving as supervision signal to bootstrap a detector. Despite promising results, these approaches do not provide class labels or generalize well to static objects. Furthermore, they are mostly restricted to data containing multiple drives from the same scene or images from a precisely calibrated and synchronized camera setup. To overcome these limitations, we propose a vision-language-guided unsupervised 3D detection approach that operates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Robotics and Automated Systems
MethodsContrastive Language-Image Pre-training
