Video Depth Propagation

Luigi Piccinelli; Thiemo Wandel; Christos Sakaridis; Wim Abbeloos; Luc Van Gool

arXiv:2512.10725·cs.CV·December 12, 2025

Video Depth Propagation

Luigi Piccinelli, Thiemo Wandel, Christos Sakaridis, Wim Abbeloos, Luc Van Gool

PDF

Open Access

TL;DR

VeloDepth is an efficient online video depth estimation method that leverages spatiotemporal priors and feature propagation to achieve real-time, consistent, and accurate depth predictions across video frames.

Contribution

The paper introduces VeloDepth, a novel propagation module and a structurally enforced temporal consistency approach for real-time video depth estimation.

Findings

01

Achieves state-of-the-art temporal consistency on benchmarks.

02

Provides faster inference compared to existing methods.

03

Maintains competitive depth accuracy in real-time applications.

Abstract

Depth estimation in videos is essential for visual perception in real-world applications. However, existing methods either rely on simple frame-by-frame monocular models, leading to temporal inconsistencies and inaccuracies, or use computationally demanding temporal modeling, unsuitable for real-time applications. These limitations significantly restrict general applicability and performance in practical settings. To address this, we propose VeloDepth, an efficient and robust online video depth estimation pipeline that effectively leverages spatiotemporal priors from previous depth predictions and performs deep feature propagation. Our method introduces a novel Propagation Module that refines and propagates depth features and predictions using flow-based warping coupled with learned residual corrections. In addition, our design structurally enforces temporal consistency, resulting in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Human Pose and Action Recognition