RTM3D: Real-time Monocular 3D Detection from Object Keypoints for   Autonomous Driving

Peixuan Li; Huaici Zhao; Pengfei Liu; Feidao Cao

arXiv:2001.03343·cs.CV·January 13, 2020·51 cites

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving

Peixuan Li, Huaici Zhao, Pengfei Liu, Feidao Cao

PDF

Open Access 2 Repos

TL;DR

This paper introduces RTM3D, a real-time monocular 3D detection framework for autonomous driving that predicts object keypoints and uses geometric relationships to accurately determine 3D properties from a single image.

Contribution

The paper presents the first real-time monocular 3D detection system that predicts nine perspective keypoints and uses geometric constraints, achieving state-of-the-art results without external supervision.

Findings

01

Achieves real-time detection speed on KITTI benchmark.

02

Outperforms previous methods in 3D detection accuracy.

03

Operates with a small, efficient architecture.

Abstract

In this work, we propose an efficient and accurate monocular 3D detection framework in single shot. Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component. Four edges of a 2D box provide only four constraints and the performance deteriorates dramatically with the small error of the 2D detector. Different from these approaches, our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space. In this method, the properties of the object can be predicted stably even when the estimation of keypoints is very noisy, which enables us to obtain fast detection speed with a small architecture. Training our method only uses the 3D properties of the object without the need…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Vision and Imaging · Robotics and Sensor-Based Localization

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings