WEDepth: Efficient Adaptation of World Knowledge for Monocular Depth Estimation
Gongshu Wang, Zhirui Wang, Kan Yang

TL;DR
WEDepth introduces a novel method to adapt Vision Foundation Models for monocular depth estimation by leveraging their inherent priors without structural modifications, achieving state-of-the-art results and strong zero-shot transfer capabilities.
Contribution
It presents a new approach to adapt VFMs for MDE that does not require modifying their structure or weights, effectively utilizing their priors for improved depth estimation.
Findings
Achieves state-of-the-art performance on NYU-Depth v2 and KITTI datasets.
Demonstrates strong zero-shot transfer capabilities across diverse scenarios.
Outperforms diffusion-based and relative depth pre-trained methods.
Abstract
Monocular depth estimation (MDE) has widely applicable but remains highly challenging due to the inherently ill-posed nature of reconstructing 3D scenes from single 2D images. Modern Vision Foundation Models (VFMs), pre-trained on large-scale diverse datasets, exhibit remarkable world understanding capabilities that benefit for various vision tasks. Recent studies have demonstrated significant improvements in MDE through fine-tuning these VFMs. Inspired by these developments, we propose WEDepth, a novel approach that adapts VFMs for MDE without modi-fying their structures and pretrained weights, while effec-tively eliciting and leveraging their inherent priors. Our method employs the VFM as a multi-level feature en-hancer, systematically injecting prior knowledge at differ-ent representation levels. Experiments on NYU-Depth v2 and KITTI datasets show that WEDepth establishes new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Image Processing Techniques · Advanced Neural Network Applications
