TL;DR
Lotus-2 introduces a two-stage deterministic framework leveraging pre-trained diffusion models as priors for accurate, stable geometric dense prediction from a single image, achieving state-of-the-art results with limited data.
Contribution
The paper presents Lotus-2, a novel two-stage deterministic approach that effectively exploits diffusion priors for geometric inference, surpassing previous methods with less training data.
Findings
Achieves state-of-the-art monocular depth estimation results.
Uses less than 1% of large-scale datasets for training.
Demonstrates diffusion models can be adapted for deterministic geometric tasks.
Abstract
Recovering pixel-wise geometric properties from a single image is fundamentally ill-posed due to appearance ambiguity and non-injective mappings between 2D observations and 3D structures. While discriminative regression models achieve strong performance through large-scale supervision, their success is bounded by the scale, quality, and diversity of available data, as well as by limited physical reasoning. Recent diffusion models exhibit powerful world priors that encode geometry and semantics learned from massive image-text data, yet directly reusing their stochastic generative formulation is suboptimal for deterministic geometric inference: the former is optimized for diverse and high-fidelity image generation, whereas the latter requires stable and accurate predictions. In this work, we propose Lotus-2, a two-stage deterministic framework for stable, accurate and fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging
