TL;DR
This paper introduces an unsupervised learning approach for monocular depth and ego-motion estimation that uses multiple masks to address occlusion and projection issues, improving accuracy and generalization.
Contribution
It presents a novel masking technique to handle occlusion and projection problems, enhancing the efficiency and accuracy of unsupervised depth and ego-motion learning from monocular videos.
Findings
Achieves good depth and ego-motion performance on KITTI dataset
Demonstrates strong generalization on uncalibrated bike videos
Uses geometric filtering to improve training accuracy
Abstract
A new unsupervised learning method of depth and ego-motion using multiple masks from monocular video is proposed in this paper. The depth estimation network and the ego-motion estimation network are trained according to the constraints of depth and ego-motion without truth values. The main contribution of our method is to carefully consider the occlusion of the pixels generated when the adjacent frames are projected to each other, and the blank problem generated in the projection target imaging plane. Two fine masks are designed to solve most of the image pixel mismatch caused by the movement of the camera. In addition, some relatively rare circumstances are considered, and repeated masking is proposed. To some extent, the method is to use a geometric relationship to filter the mismatched pixels for training, making unsupervised learning more efficient and accurate. The experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
