Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

Jing He; Haodong Li; Mingzhi Sheng; Ying-Cong Chen

arXiv:2512.01030·cs.CV·May 19, 2026

Lotus-2: Advancing Geometric Dense Prediction with Powerful Image Generative Model

Jing He, Haodong Li, Mingzhi Sheng, Ying-Cong Chen

PDF

1 Repo 1 Models

TL;DR

Lotus-2 introduces a two-stage deterministic framework leveraging pre-trained diffusion models as priors for accurate, stable geometric dense prediction from a single image, achieving state-of-the-art results with limited data.

Contribution

The paper presents Lotus-2, a novel two-stage deterministic approach that effectively exploits diffusion priors for geometric inference, surpassing previous methods with less training data.

Findings

01

Achieves state-of-the-art monocular depth estimation results.

02

Uses less than 1% of large-scale datasets for training.

03

Demonstrates diffusion models can be adapted for deterministic geometric tasks.

Abstract

Recovering pixel-wise geometric properties from a single image is fundamentally ill-posed due to appearance ambiguity and non-injective mappings between 2D observations and 3D structures. While discriminative regression models achieve strong performance through large-scale supervision, their success is bounded by the scale, quality, and diversity of available data, as well as by limited physical reasoning. Recent diffusion models exhibit powerful world priors that encode geometry and semantics learned from massive image-text data, yet directly reusing their stochastic generative formulation is suboptimal for deterministic geometric inference: the former is optimized for diverse and high-fidelity image generation, whereas the latter requires stable and accurate predictions. In this work, we propose Lotus-2, a two-stage deterministic framework for stable, accurate and fine-grained…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

envision-research/Lotus-2
github

Models

🤗
jingheya/Lotus-2
model· ♡ 11
♡ 11

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Advanced Vision and Imaging