Unsupervised Scale-consistent Depth and Ego-motion Learning from   Monocular Video

Jia-Wang Bian; Zhichao Li; Naiyan Wang; Huangying Zhan; Chunhua Shen,; Ming-Ming Cheng; Ian Reid

arXiv:1908.10553·cs.CV·October 4, 2019·121 cites

Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video

Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen,, Ming-Ming Cheng, Ian Reid

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel unsupervised learning framework for monocular video that ensures scale consistency in depth and ego-motion estimation, effectively handling moving objects and occlusions, and achieving state-of-the-art results.

Contribution

It proposes a geometry consistency loss and self-discovered masking to improve scale consistency and robustness without multi-task learning, enabling long-term, scale-consistent visual odometry from monocular videos.

Findings

01

Achieves state-of-the-art depth estimation on KITTI dataset.

02

Predicts globally scale-consistent camera trajectories over long sequences.

03

Competitive visual odometry accuracy compared to stereo-based methods.

Abstract

Recent work has shown that CNN-based depth and ego-motion estimators can be learned using unlabelled monocular videos. However, the performance is limited by unidentified moving objects that violate the underlying static scene assumption in geometric image reconstruction. More significantly, due to lack of proper constraints, networks output scale-inconsistent results over different samples, i.e., the ego-motion network cannot provide full camera trajectories over a long video sequence because of the per-frame scale ambiguity. This paper tackles these challenges by proposing a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions. Since we do not leverage multi-task learning like recent works, our framework is much simpler and more efficient. Comprehensive evaluation results demonstrate that our depth…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques