Iris: Integrating Language into Diffusion-based Monocular Depth Estimation
Ziyao Zeng, Jingcheng Ni, Daniel Wang, Patrick Rim, Younjoon Chung, Fengyu Yang, Byung-Woo Hong, Alex Wong

TL;DR
This paper introduces Iris, a method that incorporates textual descriptions into diffusion-based monocular depth estimation to improve accuracy and scene understanding, especially in small regions, by leveraging language as an additional condition.
Contribution
The paper presents a novel approach to integrate language into diffusion-based depth estimation models, enhancing their accuracy and interpretability compared to previous purely visual methods.
Findings
Improved depth estimation accuracy, especially in small regions.
Enhanced depth perception of text-described specific regions.
Language acts as a constraint to accelerate training and inference.
Abstract
Traditional monocular depth estimation suffers from inherent ambiguity and visual nuisances. We demonstrate that language can enhance monocular depth estimation by providing an additional condition (rather than images alone) aligned with plausible 3D scenes, thereby reducing the solution space for depth estimation. This conditional distribution is learned during the text-to-image pre-training of diffusion models. To generate images under various viewpoints and layouts that precisely reflect textual descriptions, the model implicitly models object sizes, shapes, and scales, their spatial relationships, and the overall scene structure. In this paper, Iris, we investigate the benefits of our strategy to integrate text descriptions into training and inference of diffusion-based depth estimation models. We experiment with three different diffusion-based monocular depth estimators (Marigold,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis
MethodsALIGN · Diffusion
