WildDet3D: Scaling Promptable 3D Detection in the Wild
Weikai Huang, Jieyu Zhang, Sijun Li, Taoyang Jia, Jiafei Duan, Yunqian Cheng, Jaemin Cho, Matthew Wallingford, Rustin Soraki, Chris Dongjoo Kim, Shuo Liu, Donovan Clay, Taira Anderson, Winson Han, Ali Farhadi, Bharath Hariharan, Zhongzheng Ren, Ranjay Krishna

TL;DR
WildDet3D introduces a versatile, geometry-aware 3D detection architecture that supports multiple prompts and leverages geometric cues, alongside a large diverse dataset, to advance open-world 3D object understanding.
Contribution
The paper presents WildDet3D, a unified architecture for promptable 3D detection, and WildDet3D-Data, the largest open 3D detection dataset, enabling significant progress in open-world 3D understanding.
Findings
WildDet3D achieves state-of-the-art results on multiple benchmarks.
Incorporating depth cues at inference improves AP by 20.7 on average.
WildDet3D performs well in zero-shot settings across diverse datasets.
Abstract
Understanding objects in 3D from a single image is a cornerstone of spatial intelligence. A key step toward this goal is monocular 3D object detection--recovering the extent, location, and orientation of objects from an input RGB image. To be practical in the open world, such a detector must generalize beyond closed-set categories, support diverse prompt modalities, and leverage geometric cues when available. Progress is hampered by two bottlenecks: existing methods are designed for a single prompt type and lack a mechanism to incorporate additional geometric cues, and current 3D datasets cover only narrow categories in controlled environments, limiting open-world transfer. In this work we address both gaps. First, we introduce WildDet3D, a unified geometry-aware architecture that natively accepts text, point, and box prompts and can incorporate auxiliary depth signals at inference…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
