Video Depth without Video Models

Bingxin Ke; Dominik Narnhofer; Shengyu Huang; Lei Ke; Torben Peters,; Katerina Fragkiadaki; Anton Obukhov; Konrad Schindler

arXiv:2411.19189·cs.CV·March 18, 2025

Video Depth without Video Models

Bingxin Ke, Dominik Narnhofer, Shengyu Huang, Lei Ke, Torben Peters,, Katerina Fragkiadaki, Anton Obukhov, Konrad Schindler

PDF

Open Access 1 Models

TL;DR

RollingDepth transforms a single-image latent diffusion model into a highly effective video depth estimation tool, producing consistent, accurate depth videos for long sequences by combining multi-frame depth estimation with an optimization-based registration.

Contribution

The paper introduces RollingDepth, a novel method that leverages a single-image latent diffusion model for state-of-the-art video depth estimation, addressing temporal consistency and long video processing.

Findings

01

Outperforms existing video depth estimators in accuracy.

02

Handles long videos with hundreds of frames efficiently.

03

Produces temporally consistent depth videos without flickering.

Abstract

Video depth estimation lifts monocular video clips to 3D by inferring dense depth at every frame. Recent advances in single-image depth estimation, brought about by the rise of large foundation models and the use of synthetic training data, have fueled a renewed interest in video depth. However, naively applying a single-image depth estimator to every frame of a video disregards temporal continuity, which not only leads to flickering but may also break when camera motion causes sudden changes in depth range. An obvious and principled solution would be to build on top of video foundation models, but these come with their own limitations; including expensive training and inference, imperfect 3D consistency, and stitching routines for the fixed-length (short) outputs. We take a step back and demonstrate how to turn a single-image latent diffusion model (LDM) into a state-of-the-art video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
prs-eth/rollingdepth-v1-0
model· 242 dl· ♡ 16
242 dl♡ 16

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Modeling in Geospatial Applications · Computer Graphics and Visualization Techniques · Computational Geometry and Mesh Generation

MethodsLatent Diffusion Model · Diffusion