Learning Auxiliary Monocular Contexts Helps Monocular 3D Object   Detection

Xianpeng Liu; Nan Xue; Tianfu Wu

arXiv:2112.04628·cs.CV·December 10, 2021

Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection

Xianpeng Liu, Nan Xue, Tianfu Wu

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces MonoCon, a novel method for monocular 3D object detection that leverages auxiliary monocular contexts during training, improving accuracy without extra data and achieving fast inference speeds.

Contribution

MonoCon employs auxiliary tasks based on projected 2D supervision signals to enhance monocular 3D detection, a novel approach inspired by the Cramer-Wold theorem.

Findings

01

Outperforms prior methods on KITTI car detection benchmark

02

Achieves real-time inference at 38.7 fps

03

Provides comparable results on pedestrians and cyclists

Abstract

Monocular 3D object detection aims to localize 3D bounding boxes in an input single 2D image. It is a highly challenging problem and remains open, especially when no extra information (e.g., depth, lidar and/or multi-frames) can be leveraged in training and/or inference. This paper proposes a simple yet effective formulation for monocular 3D object detection without exploiting any extra information. It presents the MonoCon method which learns Monocular Contexts, as auxiliary tasks in training, to help monocular 3D object detection. The key idea is that with the annotated 3D bounding boxes of objects in an image, there is a rich set of well-posed projected 2D supervision signals available in training, such as the projected corner keypoints and their associated offset vectors with respect to the center of 2D bounding box, which should be exploited as auxiliary tasks in training. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Learning Auxiliary Monocular Contexts Helps Monocular 3D Object Detection· underline

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Industrial Vision Systems and Defect Detection

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings