TL;DR
This paper introduces a novel feature restoration approach using diffusion models for monocular depth estimation, improving accuracy by leveraging invertible transforms and auxiliary features.
Contribution
It proposes a diffusion-based feature restoration framework with invertible transforms and auxiliary viewpoint enhancement for improved monocular depth estimation.
Findings
Achieves better performance than state-of-the-art on multiple datasets.
Improves KITTI benchmark RMSE by 4.09% and 37.77%.
Demonstrates the effectiveness of feature restoration via diffusion models.
Abstract
Monocular Depth Estimation (MDE) is a fundamental computer vision task with important applications in 3D vision. The current mainstream MDE methods employ an encoder-decoder architecture with multi-level/scale feature processing. However, the limitations of the current architecture and the effects of different-level features on the prediction accuracy are not evaluated. In this paper, we first investigate the above problem and show that there is still substantial potential in the current framework if encoder features can be improved. Therefore, we propose to formulate the depth estimation problem from the feature restoration perspective, by treating pretrained encoder features as degraded features of an assumed ground truth feature that yields the ground truth depth map. Then an Invertible Transform-enhanced Indirect Diffusion (InvT-IndDiffusion) module is developed for feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
