V-MIND: Building Versatile Monocular Indoor 3D Detector with Diverse 2D Annotations
Jin-Cheng Jhang, Tao Tu, Fu-En Wang, Ke Zhang, Min Sun, Cheng-Hao Kuo

TL;DR
V-MIND leverages large-scale 2D datasets and novel loss functions to improve indoor monocular 3D object detection across diverse classes, reducing reliance on labor-intensive 3D data collection.
Contribution
It introduces a method to generate pseudo 3D data from 2D images using monocular depth estimation and proposes new loss functions for better training and class ambiguity handling.
Findings
Achieves state-of-the-art performance on Omni3D indoor dataset.
Effectively utilizes 2D datasets to enhance 3D detection.
Introduces novel 3D self-calibration and ambiguity losses.
Abstract
The field of indoor monocular 3D object detection is gaining significant attention, fueled by the increasing demand in VR/AR and robotic applications. However, its advancement is impeded by the limited availability and diversity of 3D training data, owing to the labor-intensive nature of 3D data collection and annotation processes. In this paper, we present V-MIND (Versatile Monocular INdoor Detector), which enhances the performance of indoor 3D detectors across a diverse set of object classes by harnessing publicly available large-scale 2D datasets. By leveraging well-established monocular depth estimation techniques and camera intrinsic predictors, we can generate 3D training data by converting large-scale 2D images into 3D point clouds and subsequently deriving pseudo 3D bounding boxes. To mitigate distance errors inherent in the converted point clouds, we introduce a novel 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques · 3D Surveying and Cultural Heritage
MethodsSparse Evolutionary Training
