Video Depth without Video Models
Bingxin Ke, Dominik Narnhofer, Shengyu Huang, Lei Ke, Torben Peters,, Katerina Fragkiadaki, Anton Obukhov, Konrad Schindler

TL;DR
RollingDepth transforms a single-image latent diffusion model into a highly effective video depth estimation tool, producing consistent, accurate depth videos for long sequences by combining multi-frame depth estimation with an optimization-based registration.
Contribution
The paper introduces RollingDepth, a novel method that leverages a single-image latent diffusion model for state-of-the-art video depth estimation, addressing temporal consistency and long video processing.
Findings
Outperforms existing video depth estimators in accuracy.
Handles long videos with hundreds of frames efficiently.
Produces temporally consistent depth videos without flickering.
Abstract
Video depth estimation lifts monocular video clips to 3D by inferring dense depth at every frame. Recent advances in single-image depth estimation, brought about by the rise of large foundation models and the use of synthetic training data, have fueled a renewed interest in video depth. However, naively applying a single-image depth estimator to every frame of a video disregards temporal continuity, which not only leads to flickering but may also break when camera motion causes sudden changes in depth range. An obvious and principled solution would be to build on top of video foundation models, but these come with their own limitations; including expensive training and inference, imperfect 3D consistency, and stitching routines for the fixed-length (short) outputs. We take a step back and demonstrate how to turn a single-image latent diffusion model (LDM) into a state-of-the-art video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Modeling in Geospatial Applications · Computer Graphics and Visualization Techniques · Computational Geometry and Mesh Generation
MethodsLatent Diffusion Model · Diffusion
