DMS:Diffusion-Based Multi-Baseline Stereo Generation for Improving Self-Supervised Depth Estimation
Zihua Liu, Yizhou Li, Songyan Zhang, Masatoshi Okutomi

TL;DR
DMS leverages diffusion models to synthesize novel views for self-supervised depth estimation, reducing ambiguities and occlusion issues, leading to significant performance improvements.
Contribution
We introduce DMS, a diffusion-based view synthesis method that explicitly addresses occlusion and out-of-frame issues in self-supervised depth estimation.
Findings
Up to 35% reduction in outliers.
State-of-the-art results on multiple benchmarks.
Effective use of unlabeled stereo pairs for training.
Abstract
While supervised stereo matching and monocular depth estimation have advanced significantly with learning-based algorithms, self-supervised methods using stereo images as supervision signals have received relatively less focus and require further investigation. A primary challenge arises from ambiguity introduced during photometric reconstruction, particularly due to missing corresponding pixels in ill-posed regions of the target view, such as occlusions and out-of-frame areas. To address this and establish explicit photometric correspondences, we propose DMS, a model-agnostic approach that utilizes geometric priors from diffusion models to synthesize novel views along the epipolar direction, guided by directional prompts. Specifically, we finetune a Stable Diffusion model to simulate perspectives at key positions: left-left view shifted from the left camera, right-right view shifted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Optical measurement and interference techniques · Image Processing Techniques and Applications
