Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Sili Chen; Hengkai Guo; Shengnan Zhu; Feihu Zhang; Zilong Huang; Jiashi Feng; Bingyi Kang

arXiv:2501.12375·cs.CV·June 17, 2025

Video Depth Anything: Consistent Depth Estimation for Super-Long Videos

Sili Chen, Hengkai Guo, Shengnan Zhu, Feihu Zhang, Zilong Huang, Jiashi Feng, Bingyi Kang

PDF

Open Access 1 Repo 2 Models

TL;DR

This paper introduces Video Depth Anything, a method for high-quality, consistent depth estimation in super-long videos that maintains efficiency and generalization, outperforming previous short-video focused approaches.

Contribution

It proposes a novel spatial-temporal head and a key-frame-based strategy for consistent depth estimation in arbitrarily long videos, without additional geometric priors.

Findings

01

Achieves state-of-the-art zero-shot video depth estimation results.

02

Supports real-time inference at 30 FPS with a small model.

03

Maintains quality and consistency over videos longer than several minutes.

Abstract

Depth Anything has achieved remarkable success in monocular depth estimation with strong generalization ability. However, it suffers from temporal inconsistency in videos, hindering its practical applications. Various methods have been proposed to alleviate this issue by leveraging video generation models or introducing priors from optical flow and camera poses. Nonetheless, these methods are only applicable to short videos (< 10 seconds) and require a trade-off between quality and computational efficiency. We propose Video Depth Anything for high-quality, consistent depth estimation in super-long videos (over several minutes) without sacrificing efficiency. We base our model on Depth Anything V2 and replace its head with an efficient spatial-temporal head. We design a straightforward yet effective temporal consistency loss by constraining the temporal depth gradient, eliminating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DepthAnything/Video-Depth-Anything
pytorch

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Video Coding and Compression Technologies · Computer Graphics and Visualization Techniques

MethodsBalanced Selection