Mining Supervision for Dynamic Regions in Self-Supervised Monocular Depth Estimation
Hoang Chuong Nguyen, Tianyu Wang, Jose M. Alvarez, Miaomiao Liu

TL;DR
This paper introduces a self-supervised framework that improves monocular depth estimation in dynamic scenes by decoupling static and dynamic regions and using pseudo labels, leading to more accurate depth predictions.
Contribution
It proposes a novel training method that leverages pseudo depth labels and decouples static and dynamic regions for improved depth estimation in self-supervised learning.
Findings
Outperforms existing self/unsupervised methods on Cityscapes and KITTI datasets.
Effectively estimates depth in dynamic regions by decoupling static and moving objects.
Uses a scale alignment module to unify depth estimates across regions.
Abstract
This paper focuses on self-supervised monocular depth estimation in dynamic scenes trained on monocular videos. Existing methods jointly estimate pixel-wise depth and motion, relying mainly on an image reconstruction loss. Dynamic regions1 remain a critical challenge for these methods due to the inherent ambiguity in depth and motion estimation, resulting in inaccurate depth estimation. This paper proposes a self-supervised training framework exploiting pseudo depth labels for dynamic regions from training data. The key contribution of our framework is to decouple depth estimation for static and dynamic regions of images in the training data. We start with an unsupervised depth estimation approach, which provides reliable depth estimates for static regions and motion cues for dynamic regions and allows us to extract moving object information at the instance level. In the next stage, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging
