Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang; Bingyi Kang; Zilong Huang; Xiaogang Xu; Jiashi Feng,; Hengshuang Zhao

arXiv:2401.10891·cs.CV·April 9, 2024·22 cites

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang, Bingyi Kang, Zilong Huang, Xiaogang Xu, Jiashi Feng,, Hengshuang Zhao

PDF

Open Access 5 Repos 10 Models 2 Datasets

TL;DR

Depth Anything introduces a scalable, data-driven approach for monocular depth estimation by leveraging large-scale unlabeled data and simple strategies, achieving state-of-the-art zero-shot and fine-tuned results.

Contribution

The paper presents a practical foundation model for depth estimation that scales up unlabeled data and employs effective strategies without novel modules.

Findings

01

Achieves impressive zero-shot generalization across multiple datasets.

02

Sets new state-of-the-art results after fine-tuning on NYUv2 and KITTI.

03

Enables improved depth-conditioned ControlNet performance.

Abstract

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules, we aim to build a simple yet powerful foundation model dealing with any images under any circumstances. To this end, we scale up the dataset by designing a data engine to collect and automatically annotate large-scale unlabeled data (~62M), which significantly enlarges the data coverage and thus is able to reduce the generalization error. We investigate two simple yet effective strategies that make data scaling-up promising. First, a more challenging optimization target is created by leveraging data augmentation tools. It compels the model to actively seek extra visual knowledge and acquire robust representations. Second, an auxiliary supervision is developed to enforce the model to inherit rich semantic priors from pre-trained encoders. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing Techniques and Applications · Advanced Image Processing Techniques