Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Qirui Wu; Denys Iliash; Daniel Ritchie; Manolis Savva; and Angel X.; Chang

arXiv:2411.19492·cs.CV·March 18, 2025

Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling

Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, and Angel X., Chang

PDF

Open Access

TL;DR

Diorama is a novel zero-shot system that reconstructs 3D indoor scenes from a single RGB image without training or annotations, leveraging modular solutions for scene understanding and demonstrating strong generalization.

Contribution

It introduces Diorama, the first zero-shot open-world 3D scene modeling system from single images, avoiding end-to-end training and human annotations.

Findings

01

Outperforms prior methods on synthetic and real data

02

Generalizes to internet images and text-to-scene tasks

03

Successfully decomposes scene modeling into robust subtasks

Abstract

Reconstructing structured 3D scenes from RGB images using CAD objects unlocks efficient and compact scene representations that maintain compositionality and interactability. Existing works propose training-heavy methods relying on either expensive yet inaccurate real-world annotations or controllable yet monotonous synthetic data that do not generalize well to unseen objects or domains. We present Diorama, the first zero-shot open-world system that holistically models 3D scenes from single-view RGB observations without requiring end-to-end training or human annotations. We show the feasibility of our approach by decomposing the problem into subtasks and introduce robust, generalizable solutions to each: architecture reconstruction, 3D shape retrieval, object pose estimation, and scene layout optimization. We evaluate our system on both synthetic and real-world data to show we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Image Enhancement Techniques