Revisiting Self-Supervised Monocular Depth Estimation
Ue-Hwan Kim, Jong-Hwan Kim

TL;DR
This paper conducts a comprehensive empirical study of self-supervised monocular depth and motion estimation methods, revealing key insights and significantly improving performance beyond previous state-of-the-art results.
Contribution
It revisits and analyzes existing methods, investigates architectural and inter-dependency factors, and introduces enhancements that outperform prior state-of-the-art techniques.
Findings
Identified crucial architectural factors affecting performance
Unveiled inter-dependencies among previous methods
Achieved new state-of-the-art results in depth estimation
Abstract
Self-supervised learning of depth map prediction and motion estimation from monocular video sequences is of vital importance -- since it realizes a broad range of tasks in robotics and autonomous vehicles. A large number of research efforts have enhanced the performance by tackling illumination variation, occlusions, and dynamic objects, to name a few. However, each of those efforts targets individual goals and endures as separate works. Moreover, most of previous works have adopted the same CNN architecture, not reaping architectural benefits. Therefore, the need to investigate the inter-dependency of the previous methods and the effect of architectural factors remains. To achieve these objectives, we revisit numerous previously proposed self-supervised methods for joint learning of depth and motion, perform a comprehensive empirical study, and unveil multiple crucial insights.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Video Surveillance and Tracking Methods
