Diorama: Unleashing Zero-shot Single-view 3D Indoor Scene Modeling
Qirui Wu, Denys Iliash, Daniel Ritchie, Manolis Savva, and Angel X., Chang

TL;DR
Diorama is a novel zero-shot system that reconstructs 3D indoor scenes from a single RGB image without training or annotations, leveraging modular solutions for scene understanding and demonstrating strong generalization.
Contribution
It introduces Diorama, the first zero-shot open-world 3D scene modeling system from single images, avoiding end-to-end training and human annotations.
Findings
Outperforms prior methods on synthetic and real data
Generalizes to internet images and text-to-scene tasks
Successfully decomposes scene modeling into robust subtasks
Abstract
Reconstructing structured 3D scenes from RGB images using CAD objects unlocks efficient and compact scene representations that maintain compositionality and interactability. Existing works propose training-heavy methods relying on either expensive yet inaccurate real-world annotations or controllable yet monotonous synthetic data that do not generalize well to unseen objects or domains. We present Diorama, the first zero-shot open-world system that holistically models 3D scenes from single-view RGB observations without requiring end-to-end training or human annotations. We show the feasibility of our approach by decomposing the problem into subtasks and introduce robust, generalizable solutions to each: architecture reconstruction, 3D shape retrieval, object pose estimation, and scene layout optimization. We evaluate our system on both synthetic and real-world data to show we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Computer Graphics and Visualization Techniques · Image Enhancement Techniques
