Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin; Sida Peng; Jingxiao Chen; Songyou Peng; Jiaming Sun; Minghuan Liu; Hujun Bao; Jiashi Feng; Xiaowei Zhou; Bingyi Kang

arXiv:2412.14015·cs.CV·March 31, 2026

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation

Haotong Lin, Sida Peng, Jingxiao Chen, Songyou Peng, Jiaming Sun, Minghuan Liu, Hujun Bao, Jiashi Feng, Xiaowei Zhou, Bingyi Kang

PDF

1 Repo 1 Video

TL;DR

This paper introduces Prompt Depth Anything, a novel method integrating LiDAR prompts into depth models to achieve high-resolution metric depth estimation up to 4K, with state-of-the-art results.

Contribution

It pioneers the use of prompts in depth foundation models, utilizing LiDAR guidance and a scalable data pipeline for improved accuracy and resolution.

Findings

01

Achieves up to 4K resolution in depth estimation.

02

Sets new state-of-the-art on ARKitScenes and ScanNet++ datasets.

03

Enhances downstream tasks like 3D reconstruction and robotic grasping.

Abstract

Prompts play a critical role in unleashing the power of language and vision foundation models for specific tasks. For the first time, we introduce prompting into depth foundation models, creating a new paradigm for metric depth estimation termed Prompt Depth Anything. Specifically, we use a low-cost LiDAR as the prompt to guide the Depth Anything model for accurate metric depth output, achieving up to 4K resolution. Our approach centers on a concise prompt fusion design that integrates the LiDAR at multiple scales within the depth decoder. To address training challenges posed by limited datasets containin both LiDAR depth and precise GT depth, we propose a scalable data pipeline that includes synthetic data LiDAR simulation and real data pseudo GT depth generation. Our approach sets new state-of-the-arts on the ARKitScenes and ScanNet++ datasets and benefits downstream applications,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

depthanything/PromptDA
github

Videos

Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation· slideslive