Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video
Jia-Wang Bian, Zhichao Li, Naiyan Wang, Huangying Zhan, Chunhua Shen,, Ming-Ming Cheng, Ian Reid

TL;DR
This paper introduces a novel unsupervised learning framework for monocular video that ensures scale consistency in depth and ego-motion estimation, effectively handling moving objects and occlusions, and achieving state-of-the-art results.
Contribution
It proposes a geometry consistency loss and self-discovered masking to improve scale consistency and robustness without multi-task learning, enabling long-term, scale-consistent visual odometry from monocular videos.
Findings
Achieves state-of-the-art depth estimation on KITTI dataset.
Predicts globally scale-consistent camera trajectories over long sequences.
Competitive visual odometry accuracy compared to stereo-based methods.
Abstract
Recent work has shown that CNN-based depth and ego-motion estimators can be learned using unlabelled monocular videos. However, the performance is limited by unidentified moving objects that violate the underlying static scene assumption in geometric image reconstruction. More significantly, due to lack of proper constraints, networks output scale-inconsistent results over different samples, i.e., the ego-motion network cannot provide full camera trajectories over a long video sequence because of the per-frame scale ambiguity. This paper tackles these challenges by proposing a geometry consistency loss for scale-consistent predictions and an induced self-discovered mask for handling moving objects and occlusions. Since we do not leverage multi-task learning like recent works, our framework is much simpler and more efficient. Comprehensive evaluation results demonstrate that our depth…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Optical measurement and interference techniques
