SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation
Zechen Liu, Zizhang Wu, Roland T\'oth

TL;DR
SMOKE is a novel monocular 3D object detection method that directly predicts 3D bounding boxes from a single keypoint, eliminating the need for 2D proposals and refinement stages, and achieves state-of-the-art results.
Contribution
It introduces a single-stage approach combining keypoint estimation with 3D variable regression, improving accuracy and simplicity over previous methods.
Findings
Outperforms existing monocular 3D detection methods on KITTI dataset.
Achieves state-of-the-art results in 3D detection and Bird's eye view evaluation.
Does not require complex pre/post-processing or extra data.
Abstract
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation· youtube
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection
MethodsSupport Vector Machine · Max Pooling · Convolution · R-CNN
