Object-Aware Centroid Voting for Monocular 3D Object Detection
Wentao Bao, Qi Yu, Yu Kong

TL;DR
This paper introduces an end-to-end monocular 3D object detection method that avoids dense depth estimation by using a novel object-aware voting scheme based on 2D box projections and geometric cues, achieving superior results.
Contribution
It proposes a new object-aware voting approach for monocular 3D detection that does not require dense depth estimation, improving accuracy and efficiency.
Findings
Outperforms existing monocular methods on KITTI benchmark
Effectively localizes 3D objects without dense depth maps
Achieves significant accuracy improvements
Abstract
Monocular 3D object detection aims to detect objects in a 3D physical world from a single camera. However, recent approaches either rely on expensive LiDAR devices, or resort to dense pixel-wise depth estimation that causes prohibitive computational cost. In this paper, we propose an end-to-end trainable monocular 3D object detector without learning the dense depth. Specifically, the grid coordinates of a 2D box are first projected back to 3D space with the pinhole model as 3D centroids proposals. Then, a novel object-aware voting approach is introduced, which considers both the region-wise appearance attention and the geometric projection distribution, to vote the 3D centroid proposals for 3D object localization. With the late fusion and the predicted 3D orientation and dimension, the 3D bounding boxes of objects can be detected from a single RGB image. The method is straightforward…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection
