SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint   Estimation

Zechen Liu; Zizhang Wu; Roland T\'oth

arXiv:2002.10111·cs.CV·February 25, 2020·27 cites

SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation

Zechen Liu, Zizhang Wu, Roland T\'oth

PDF

Open Access 3 Repos 1 Video

TL;DR

SMOKE is a novel monocular 3D object detection method that directly predicts 3D bounding boxes from a single keypoint, eliminating the need for 2D proposals and refinement stages, and achieves state-of-the-art results.

Contribution

It introduces a single-stage approach combining keypoint estimation with 3D variable regression, improving accuracy and simplicity over previous methods.

Findings

01

Outperforms existing monocular 3D detection methods on KITTI dataset.

02

Achieves state-of-the-art results in 3D detection and Bird's eye view evaluation.

03

Does not require complex pre/post-processing or extra data.

Abstract

Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Visual Attention and Saliency Detection

MethodsSupport Vector Machine · Max Pooling · Convolution · R-CNN