V-MIND: Building Versatile Monocular Indoor 3D Detector with Diverse 2D   Annotations

Jin-Cheng Jhang; Tao Tu; Fu-En Wang; Ke Zhang; Min Sun; Cheng-Hao Kuo

arXiv:2412.11412·cs.CV·December 17, 2024

V-MIND: Building Versatile Monocular Indoor 3D Detector with Diverse 2D Annotations

Jin-Cheng Jhang, Tao Tu, Fu-En Wang, Ke Zhang, Min Sun, Cheng-Hao Kuo

PDF

Open Access

TL;DR

V-MIND leverages large-scale 2D datasets and novel loss functions to improve indoor monocular 3D object detection across diverse classes, reducing reliance on labor-intensive 3D data collection.

Contribution

It introduces a method to generate pseudo 3D data from 2D images using monocular depth estimation and proposes new loss functions for better training and class ambiguity handling.

Findings

01

Achieves state-of-the-art performance on Omni3D indoor dataset.

02

Effectively utilizes 2D datasets to enhance 3D detection.

03

Introduces novel 3D self-calibration and ambiguity losses.

Abstract

The field of indoor monocular 3D object detection is gaining significant attention, fueled by the increasing demand in VR/AR and robotic applications. However, its advancement is impeded by the limited availability and diversity of 3D training data, owing to the labor-intensive nature of 3D data collection and annotation processes. In this paper, we present V-MIND (Versatile Monocular INdoor Detector), which enhances the performance of indoor 3D detectors across a diverse set of object classes by harnessing publicly available large-scale 2D datasets. By leveraging well-established monocular depth estimation techniques and camera intrinsic predictors, we can generate 3D training data by converting large-scale 2D images into 3D point clouds and subsequently deriving pseudo 3D bounding boxes. To mitigate distance errors inherent in the converted point clouds, we introduce a novel 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · 3D Surveying and Cultural Heritage

MethodsSparse Evolutionary Training