Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection
Xianpeng Liu, Nan Xue, Tianfu Wu

TL;DR
This paper introduces MonoCon, a novel method for monocular 3D object detection that leverages auxiliary monocular contexts during training, improving accuracy without extra data and achieving fast inference speeds.
Contribution
MonoCon employs auxiliary tasks based on projected 2D supervision signals to enhance monocular 3D detection, a novel approach inspired by the Cramer-Wold theorem.
Findings
Outperforms prior methods on KITTI car detection benchmark
Achieves real-time inference at 38.7 fps
Provides comparable results on pedestrians and cyclists
Abstract
Monocular 3D object detection aims to localize 3D bounding boxes in an input single 2D image. It is a highly challenging problem and remains open, especially when no extra information (e.g., depth, lidar and/or multi-frames) can be leveraged in training and/or inference. This paper proposes a simple yet effective formulation for monocular 3D object detection without exploiting any extra information. It presents the MonoCon method which learns Monocular Contexts, as auxiliary tasks in training, to help monocular 3D object detection. The key idea is that with the annotated 3D bounding boxes of objects in an image, there is a rich set of well-posed projected 2D supervision signals available in training, such as the projected corner keypoints and their associated offset vectors with respect to the center of 2D bounding box, which should be exploited as auxiliary tasks in training. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Industrial Vision Systems and Defect Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
