Perspective-aware Convolution for Monocular 3D Object Detection
Jia-Quan Yu, Soo-Chang Pei

TL;DR
This paper introduces a perspective-aware convolutional layer that captures depth-related features in images, improving monocular 3D object detection accuracy for autonomous driving.
Contribution
It proposes a novel convolutional layer that encodes perspective information, enhancing feature extraction for monocular 3D detection tasks.
Findings
Achieved 23.9% average precision on KITTI3D easy benchmark
Improved depth inference by modeling scene perspective
Enhanced 3D detection accuracy with the new convolutional layer
Abstract
Monocular 3D object detection is a crucial and challenging task for autonomous driving vehicle, while it uses only a single camera image to infer 3D objects in the scene. To address the difficulty of predicting depth using only pictorial clue, we propose a novel perspective-aware convolutional layer that captures long-range dependencies in images. By enforcing convolutional kernels to extract features along the depth axis of every image pixel, we incorporates perspective information into network architecture. We integrate our perspective-aware convolutional layer into a 3D object detector and demonstrate improved performance on the KITTI3D dataset, achieving a 23.9\% average precision in the easy benchmark. These results underscore the importance of modeling scene clues for accurate depth inference and highlight the benefits of incorporating scene structure in network design. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods
