DeepVO: A Deep Learning approach for Monocular Visual Odometry
Vikram Mohanty, Shubh Agrawal, Shaswat Datta, Arna Ghosh, Vishnu Dutt, Sharma, Debashish Chakravarty

TL;DR
This paper introduces a deep learning framework for monocular visual odometry, replacing traditional feature-based methods, and demonstrates promising real-time pose estimation results in known environments.
Contribution
It proposes a CNN architecture tailored for monocular visual odometry, highlighting its effectiveness in estimating motion and scale without additional sensors.
Findings
Deep learning can effectively estimate camera motion from monocular images.
The proposed CNN performs well in known environments for real-time pose estimation.
Pre-trained features influence the network's ability to estimate motion accurately.
Abstract
Deep Learning based techniques have been adopted with precision to solve a lot of standard computer vision problems, some of which are image classification, object detection and segmentation. Despite the widespread success of these approaches, they have not yet been exploited largely for solving the standard perception related problems encountered in autonomous navigation such as Visual Odometry (VO), Structure from Motion (SfM) and Simultaneous Localization and Mapping (SLAM). This paper analyzes the problem of Monocular Visual Odometry using a Deep Learning-based framework, instead of the regular 'feature detection and tracking' pipeline approaches. Several experiments were performed to understand the influence of a known/unknown environment, a conventional trackable feature and pre-trained activations tuned for object classification on the network's ability to accurately estimate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobotics and Sensor-Based Localization · Advanced Vision and Imaging · Advanced Image and Video Retrieval Techniques
