3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Yung-Hsu Yang; Luigi Piccinelli; Mattia Segu; Siyuan Li; Rui Huang; Yuqian Fu; Marc Pollefeys; Hermann Blum; Zuria Bauer

arXiv:2507.23567·cs.CV·September 8, 2025

3D-MOOD: Lifting 2D to 3D for Monocular Open-Set Object Detection

Yung-Hsu Yang, Luigi Piccinelli, Mattia Segu, Siyuan Li, Rui Huang, Yuqian Fu, Marc Pollefeys, Hermann Blum, Zuria Bauer

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper introduces 3D-MOOD, an end-to-end monocular 3D object detection method capable of handling open-set scenarios with novel objects and environments, achieving state-of-the-art results across multiple datasets.

Contribution

The paper presents the first open-set monocular 3D detection framework that lifts 2D detections into 3D space and incorporates geometry priors for better generalization.

Findings

01

Achieves state-of-the-art results on Omni3D, Argoverse 2, and ScanNet datasets.

02

Effectively generalizes to unseen object categories and environments.

03

Demonstrates the benefit of joint 2D-3D training and canonical image space for open-set detection.

Abstract

Monocular 3D object detection is valuable for various applications such as robotics and AR/VR. Existing methods are confined to closed-set settings, where the training and testing sets consist of the same scenes and/or object categories. However, real-world applications often introduce new environments and novel object categories, posing a challenge to these methods. In this paper, we address monocular 3D object detection in an open-set setting and introduce the first end-to-end 3D Monocular Open-set Object Detector (3D-MOOD). We propose to lift the open-set 2D detection into 3D space through our designed 3D bounding box head, enabling end-to-end joint training for both 2D and 3D tasks to yield better overall performance. We condition the object queries with geometry prior and overcome the generalization for 3D estimation across diverse scenes. To further improve performance, we design…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
RoyYang0714/3D-MOOD
model· ♡ 2
♡ 2

Datasets

RoyYang0714/3D-MOOD
dataset· 70 dl
70 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection