StereoDiff: Stereo-Diffusion Synergy for Video Depth Estimation
Haodong Li, Chen Wang, Jiahui Lei, Kostas Daniilidis, Lingjie Liu

TL;DR
StereoDiff introduces a two-stage video depth estimation method that combines stereo matching for static regions and diffusion models for dynamic regions, achieving state-of-the-art results with improved consistency and accuracy.
Contribution
The paper proposes a novel synergy of stereo matching and diffusion models tailored for static and dynamic regions in videos, addressing limitations of existing image-based methods.
Findings
Achieves state-of-the-art performance on real-world video benchmarks.
Demonstrates superior temporal consistency in depth estimation.
Effectively handles both static backgrounds and dynamic objects.
Abstract
Recent video depth estimation methods achieve great performance by following the paradigm of image depth estimation, i.e., typically fine-tuning pre-trained video diffusion models with massive data. However, we argue that video depth estimation is not a naive extension of image depth estimation. The temporal consistency requirements for dynamic and static regions in videos are fundamentally different. Consistent video depth in static regions, typically backgrounds, can be more effectively achieved via stereo matching across all frames, which provides much stronger global 3D cues. While the consistency for dynamic regions still should be learned from large-scale video depth data to ensure smooth transitions, due to the violation of triangulation constraints. Based on these insights, we introduce StereoDiff, a two-stage video depth estimator that synergizes stereo matching for mainly the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Image and Signal Denoising Methods
MethodsDiffusion
