Multi-Stage CNN-Based Monocular 3D Vehicle Localization and Orientation Estimation
Ali Babolhavaeji, Mohammad Fanaei

TL;DR
This paper presents a multi-stage CNN model that estimates 3D vehicle location and orientation from monocular images by combining bird's-eye view elevation maps and deep feature representations, achieving promising results.
Contribution
It introduces a novel multi-branch CNN architecture that integrates elevation mapping with feature extraction for monocular 3D vehicle detection and orientation estimation.
Findings
Effective 3D localization and orientation estimation demonstrated on benchmark datasets.
The model leverages bird's-eye view elevation maps for improved depth estimation.
Promising accuracy in monocular 3D vehicle detection tasks.
Abstract
This paper aims to design a 3D object detection model from 2D images taken by monocular cameras by combining the estimated bird's-eye view elevation map and the deep representation of object features. The proposed model has a pre-trained ResNet-50 network as its backend network and three more branches. The model first builds a bird's-eye view elevation map to estimate the depth of the object in the scene and by using that estimates the object's 3D bounding boxes. We have trained and evaluate it on two major datasets: a syntactic dataset and the KIITI dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
