Zoo3D: Zero-Shot 3D Object Detection at Scene Level

Andrey Lemeshko; Bulat Gabdullin; Nikita Drozdov; Anton Konushin; Danila Rukhovich; Maksim Kolodiazhnyi

arXiv:2511.20253·cs.CV·November 26, 2025

Zoo3D: Zero-Shot 3D Object Detection at Scene Level

Andrey Lemeshko, Bulat Gabdullin, Nikita Drozdov, Anton Konushin, Danila Rukhovich, Maksim Kolodiazhnyi

PDF

Open Access

TL;DR

Zoo3D introduces a training-free, zero-shot 3D object detection framework that constructs 3D bounding boxes and assigns semantic labels without prior training, achieving state-of-the-art results on multiple benchmarks.

Contribution

It is the first training-free 3D detection method that operates in zero-shot mode and extends to images, significantly advancing open-vocabulary 3D understanding.

Findings

01

Zero-shot Zoo3D$_0$ outperforms existing self-supervised methods.

02

Both Zoo3D$_0$ and Zoo3D$_1$ achieve state-of-the-art results.

03

The method works directly with posed and unposed images.

Abstract

3D object detection is fundamental for spatial understanding. Real-world environments demand models capable of recognizing diverse, previously unseen objects, which remains a major limitation of closed-set methods. Existing open-vocabulary 3D detectors relax annotation requirements but still depend on training scenes, either as point clouds or images. We take this a step further by introducing Zoo3D, the first training-free 3D object detection framework. Our method constructs 3D bounding boxes via graph clustering of 2D instance masks, then assigns semantic labels using a novel open-vocabulary module with best-view selection and view-consensus mask generation. Zoo3D operates in two modes: the zero-shot Zoo3D $_{0}$ , which requires no training at all, and the self-supervised Zoo3D $_{1}$ , which refines 3D box prediction by training a class-agnostic detector on Zoo3D $_{0}$ -generated pseudo…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · 3D Shape Modeling and Analysis · Robotics and Sensor-Based Localization