Diversity Matters: Fully Exploiting Depth Clues for Reliable Monocular 3D Object Detection
Zhuoling Li, Zhan Qu, Yang Zhou, Jianzhuang Liu, Haoqian Wang, Lihui, Jiang

TL;DR
This paper introduces a novel depth solving system for monocular 3D object detection that leverages multiple depth estimations from diverse assumptions, improving robustness and accuracy without extra data.
Contribution
It proposes a depth estimation approach that generates multiple hypotheses from different assumptions and adaptively combines them for more reliable monocular 3D detection.
Findings
Surpasses the current best method by over 20% on KITTI benchmark
Achieves higher robustness by exploiting diverse depth clues
Maintains real-time efficiency
Abstract
As an inherently ill-posed problem, depth estimation from single images is the most challenging part of monocular 3D object detection (M3OD). Many existing methods rely on preconceived assumptions to bridge the missing spatial information in monocular images, and predict a sole depth value for every object of interest. However, these assumptions do not always hold in practical applications. To tackle this problem, we propose a depth solving system that fully explores the visual clues from the subtasks in M3OD and generates multiple estimations for the depth of each target. Since the depth estimations rely on different assumptions in essence, they present diverse distributions. Even if some assumptions collapse, the estimations established on the remaining assumptions are still reliable. In addition, we develop a depth selection and combination strategy. This strategy is able to remove…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
