DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Ziyang Song; Zerong Wang; Bo Li; Hao Zhang; Ruijie Zhu; Li Liu; Peng-Tao Jiang; Tianzhu Zhang

arXiv:2501.02576·cs.CV·April 24, 2026

DepthMaster: Taming Diffusion Models for Monocular Depth Estimation

Ziyang Song, Zerong Wang, Bo Li, Hao Zhang, Ruijie Zhu, Li Liu, Peng-Tao Jiang, Tianzhu Zhang

PDF

2 Repos 1 Models

TL;DR

DepthMaster introduces a single-step diffusion model with feature alignment and Fourier enhancement modules for improved monocular depth estimation, balancing global structure and fine details.

Contribution

It proposes a novel single-step diffusion approach with modules for semantic feature alignment and Fourier-based detail enhancement, advancing depth estimation accuracy and efficiency.

Findings

01

Achieves state-of-the-art generalization in depth estimation.

02

Effectively balances low-frequency structure and high-frequency details.

03

Outperforms existing diffusion-based methods across datasets.

Abstract

Monocular depth estimation within the diffusion-denoising paradigm demonstrates impressive generalization ability but suffers from low inference speed. Recent methods adopt a single-step deterministic paradigm to improve inference efficiency while maintaining comparable performance. However, they overlook the gap between generative and discriminative features, leading to suboptimal results. In this work, we propose DepthMaster, a single-step diffusion model designed to adapt generative features for the discriminative depth estimation task. First, to mitigate overfitting to texture details introduced by generative features, we propose a Feature Alignment module, which incorporates high-quality semantic features to enhance the denoising network's representation capability. Second, to address the lack of fine-grained details in the single-step deterministic framework, we propose a Fourier…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
zysong212/DepthMaster
model· 17 dl· ♡ 9
17 dl♡ 9

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.