Multi-Stage CNN-Based Monocular 3D Vehicle Localization and Orientation   Estimation

Ali Babolhavaeji; Mohammad Fanaei

arXiv:2011.12256·cs.CV·November 25, 2020

Multi-Stage CNN-Based Monocular 3D Vehicle Localization and Orientation Estimation

Ali Babolhavaeji, Mohammad Fanaei

PDF

TL;DR

This paper presents a multi-stage CNN model that estimates 3D vehicle location and orientation from monocular images by combining bird's-eye view elevation maps and deep feature representations, achieving promising results.

Contribution

It introduces a novel multi-branch CNN architecture that integrates elevation mapping with feature extraction for monocular 3D vehicle detection and orientation estimation.

Findings

01

Effective 3D localization and orientation estimation demonstrated on benchmark datasets.

02

The model leverages bird's-eye view elevation maps for improved depth estimation.

03

Promising accuracy in monocular 3D vehicle detection tasks.

Abstract

This paper aims to design a 3D object detection model from 2D images taken by monocular cameras by combining the estimated bird's-eye view elevation map and the deep representation of object features. The proposed model has a pre-trained ResNet-50 network as its backend network and three more branches. The model first builds a bird's-eye view elevation map to estimate the depth of the object in the scene and by using that estimates the object's 3D bounding boxes. We have trained and evaluate it on two major datasets: a syntactic dataset and the KIITI dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.