Recurrent Neural Network for (Un-)supervised Learning of Monocular   VideoVisual Odometry and Depth

Rui Wang; Stephen M. Pizer; Jan-Michael Frahm

arXiv:1904.07087·cs.CV·April 16, 2019·41 cites

Recurrent Neural Network for (Un-)supervised Learning of Monocular VideoVisual Odometry and Depth

Rui Wang, Stephen M. Pizer, Jan-Michael Frahm

PDF

Open Access 1 Repo

TL;DR

This paper introduces a recurrent neural network approach for monocular video-based depth and odometry estimation that leverages motion information, achieving superior results in both supervised and unsupervised settings on the KITTI dataset.

Contribution

It presents a novel RNN-based model that incorporates motion cues for depth and odometry estimation, capable of training in supervised or unsupervised modes, and generalizes from multi-view to single-view depth estimation.

Findings

01

Outperforms state-of-the-art methods on KITTI dataset

02

Effective in both supervised and unsupervised training modes

03

Generalizes well from multi-view to single-view depth estimation

Abstract

Deep learning-based, single-view depth estimation methods have recently shown highly promising results. However, such methods ignore one of the most important features for determining depth in the human vision system, which is motion. We propose a learning-based, multi-view dense depth map and odometry estimation method that uses Recurrent Neural Networks (RNN) and trains utilizing multi-view image reprojection and forward-backward flow-consistency losses. Our model can be trained in a supervised or even unsupervised mode. It is designed for depth and visual odometry estimation from video where the input frames are temporally correlated. However, it also generalizes to single-view depth estimation. Our method produces superior results to the state-of-the-art approaches for single-view and multi-view learning-based depth estimation on the KITTI driving dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wrlife/RNN_depth_pose
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Image Processing Techniques and Applications