Learning Depth-Guided Convolutions for Monocular 3D Object Detection

Mingyu Ding; Yuqi Huo; Hongwei Yi; Zhe Wang; Jianping Shi; Zhiwu Lu,; Ping Luo

arXiv:1912.04799·cs.CV·December 16, 2019·30 cites

Learning Depth-Guided Convolutions for Monocular 3D Object Detection

Mingyu Ding, Yuqi Huo, Hongwei Yi, Zhe Wang, Jianping Shi, Zhiwu Lu,, Ping Luo

PDF

Open Access 2 Repos 2 Videos

TL;DR

This paper introduces D4LCN, a novel depth-guided local convolutional network that enhances monocular 3D object detection by learning adaptive filters from depth maps, outperforming existing methods significantly.

Contribution

It proposes a new depth-guided local convolutional network (D4LCN) that automatically learns filters from depth maps, bridging the gap between 2D image features and 3D structure for improved detection.

Findings

01

D4LCN outperforms existing methods by large margins.

02

Achieves 9.1% improvement over state-of-the-art on KITTI.

03

Extensive experiments validate the effectiveness of D4LCN.

Abstract

3D object detection from a single image without LiDAR is a challenging task due to the lack of accurate depth information. Conventional 2D convolutions are unsuitable for this task because they fail to capture local object and its scale information, which are vital for 3D object detection. To better represent 3D structure, prior arts typically transform depth maps estimated from 2D images into a pseudo-LiDAR representation, and then apply existing 3D point-cloud based object detectors. However, their results depend heavily on the accuracy of the estimated depth maps, resulting in suboptimal performance. In this work, instead of using pseudo-LiDAR representation, we improve the fundamental 2D fully convolutions by proposing a new local convolutional network (LCN), termed Depth-guided Dynamic-Depthwise-Dilated LCN (D $^{4}$ LCN), where the filters and their receptive fields can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Learning Depth-Guided Convolutions for Monocular 3D Object Detection· youtube

Taxonomy

TopicsAdvanced Neural Network Applications · Industrial Vision Systems and Defect Detection · Human Pose and Action Recognition