Just Add Geometry: Gradient-Free Open-Vocabulary 3D Detection Without Human-in-the-Loop
Atharv Goel, Mehar Khurana

TL;DR
This paper introduces a training-free, open-vocabulary 3D object detection method leveraging 2D vision-language models, geometric strategies, and camera geometry to detect diverse objects without human-annotated 3D labels.
Contribution
It presents a novel pipeline that uses 2D foundation models and geometric inflation to perform open-vocabulary 3D detection without any training or manual 3D annotations.
Findings
Achieves competitive localization performance with LiDAR and RGB inputs.
Operates without training, demonstrating the potential of 2D models for 3D perception.
Works effectively in adverse conditions with a fog-augmented dataset.
Abstract
Modern 3D object detection datasets are constrained by narrow class taxonomies and costly manual annotations, limiting their ability to scale to open-world settings. In contrast, 2D vision-language models trained on web-scale image-text pairs exhibit rich semantic understanding and support open-vocabulary detection via natural language prompts. In this work, we leverage the maturity and category diversity of 2D foundation models to perform open-vocabulary 3D object detection without any human-annotated 3D labels. Our pipeline uses a 2D vision-language detector to generate text-conditioned proposals, which are segmented with SAM and back-projected into 3D using camera geometry and either LiDAR or monocular pseudo-depth. We introduce a geometric inflation strategy based on DBSCAN clustering and Rotating Calipers to infer 3D bounding boxes without training. To simulate adverse real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Multimodal Machine Learning Applications · 3D Shape Modeling and Analysis
