Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction
Jason Ku, Alex D. Pon, Steven L. Waslander

TL;DR
MonoPSR is a monocular 3D object detection approach that combines accurate proposals from 2D detections with shape reconstruction and novel loss functions to improve localization accuracy, achieving state-of-the-art results on KITTI.
Contribution
The paper introduces a new method that integrates proposal-based 3D detection with shape reconstruction and a projection alignment loss for improved monocular 3D detection.
Findings
Achieves state-of-the-art results on KITTI benchmark.
Effectively detects pedestrians and cyclists with high accuracy.
Maintains efficient run-time performance.
Abstract
We present MonoPSR, a monocular 3D object detection method that leverages proposals and shape reconstruction. First, using the fundamental relations of a pinhole camera model, detections from a mature 2D object detector are used to generate a 3D proposal per object in a scene. The 3D location of these proposals prove to be quite accurate, which greatly reduces the difficulty of regressing the final 3D bounding box detection. Simultaneously, a point cloud is predicted in an object centered coordinate system to learn local scale and shape information. However, the key challenge is how to exploit shape information to guide 3D localization. As such, we devise aggregate losses, including a novel projection alignment loss, to jointly optimize these tasks in the neural network to improve 3D localization accuracy. We validate our method on the KITTI benchmark where we set new state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Neural Network Applications · Human Pose and Action Recognition
