RoamScene3D: Immersive Text-to-3D Scene Generation via Adaptive Object-aware Roaming
Jisheng Chu, Wenrui Li, Rui Zhao, Wangmeng Zuo, Shifeng Chen, Xiaopeng Fan

TL;DR
RoamScene3D introduces a novel framework for generating immersive, photorealistic 3D scenes from text by leveraging semantic scene graphs and adaptive camera trajectories, overcoming spatial blindness and inpainting limitations.
Contribution
The paper presents a new approach combining semantic scene graph reasoning with motion-aware inpainting to improve 3D scene generation from text, addressing key limitations of prior methods.
Findings
Outperforms state-of-the-art in scene consistency and realism
Effectively models object relations for adaptive scene exploration
Demonstrates robustness to camera motion and occlusions
Abstract
Generating immersive 3D scenes from texts is a core task in computer vision, crucial for applications in virtual reality and game development. Despite the promise of leveraging 2D diffusion priors, existing methods suffer from spatial blindness and rely on predefined trajectories that fail to exploit the inner relationships among salient objects. Consequently, these approaches are unable to comprehend the semantic layout, preventing them from exploring the scene adaptively to infer occluded content. Moreover, current inpainting models operate in 2D image space, struggling to plausibly fill holes caused by camera motion. To address these limitations, we propose RoamScene3D, a novel framework that bridges the gap between semantic guidance and spatial generation. Our method reasons about the semantic relations among objects and produces consistent and photorealistic scenes. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
