Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data
Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng,, Hengshuang Zhao

TL;DR
Depth Anything introduces a scalable, data-driven approach for monocular depth estimation by leveraging large-scale unlabeled data and simple strategies, achieving state-of-the-art zero-shot and fine-tuned results.
Contribution
The paper presents a practical foundation model for depth estimation that scales up unlabeled data and employs effective strategies without novel modules.
Findings
Achieves impressive zero-shot generalization across multiple datasets.
Sets new state-of-the-art results after fine-tuning on NYUv2 and KITTI.
Enables improved depth-conditioned ControlNet performance.
Abstract
This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circumstances. To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error. We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗depth-anything/Depth-Anything-V2-Metric-Indoor-Small-hfmodel· 6.6k dl· ♡ 86.6k dl♡ 8
- 🤗apple/coreml-depth-anything-v2-smallmodel· 710 dl· ♡ 93710 dl♡ 93
- 🤗LiheYoung/depth-anything-small-hfmodel· 90k dl· ♡ 3590k dl♡ 35
- 🤗LiheYoung/depth-anything-base-hfmodel· 48k dl· ♡ 1248k dl♡ 12
- 🤗LiheYoung/depth_anything_vits14model· 3.6k dl· ♡ 93.6k dl♡ 9
- 🤗LiheYoung/depth_anything_vitb14model· 2.0k dl· ♡ 32.0k dl♡ 3
- 🤗LiheYoung/depth_anything_vitl14model· 40k dl· ♡ 4340k dl♡ 43
- 🤗LiheYoung/depth-anything-large-hfmodel· 415k dl· ♡ 62415k dl♡ 62
- 🤗halimb/depth-anything-small-hfmodel· 8 dl· ♡ 18 dl♡ 1
- 🤗halimb/depth-anything-large-hfmodel· 7 dl7 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques
