Self-Supervised Joint Learning Framework of Depth Estimation via Implicit Cues
Jianrong Wang, Ge Zhang, Zhenyu Wu, XueWei Li, Li Liu

TL;DR
This paper introduces a self-supervised joint learning framework for monocular depth estimation that leverages dynamic and static cues from consecutive video frames, improving accuracy especially for dynamic objects.
Contribution
It proposes an implicit depth cue extractor and a high-dimensional attention module to enhance depth prediction by utilizing motion and geometric scene cues.
Findings
Outperforms state-of-the-art methods on KITTI and Make3D datasets.
Effectively captures dynamic scene information for improved depth estimation.
Enhances robustness of depth predictions through novel attention mechanisms.
Abstract
In self-supervised monocular depth estimation, the depth discontinuity and motion objects' artifacts are still challenging problems. Existing self-supervised methods usually utilize a single view to train the depth estimation network. Compared with static views, abundant dynamic properties between video frames are beneficial to refined depth estimation, especially for dynamic objects. In this work, we propose a novel self-supervised joint learning framework for depth estimation using consecutive frames from monocular and stereo videos. The main idea is using an implicit depth cue extractor which leverages dynamic and static cues to generate useful depth proposals. These cues can predict distinguishable motion contours and geometric scene structures. Furthermore, a new high-dimensional attention module is introduced to extract clear global transformation, which effectively suppresses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
