RoamScene3D: Immersive Text-to-3D Scene Generation via Adaptive Object-aware Roaming

Jisheng Chu; Wenrui Li; Rui Zhao; Wangmeng Zuo; Shifeng Chen; Xiaopeng Fan

arXiv:2601.19433·cs.CV·January 28, 2026

RoamScene3D: Immersive Text-to-3D Scene Generation via Adaptive Object-aware Roaming

Jisheng Chu, Wenrui Li, Rui Zhao, Wangmeng Zuo, Shifeng Chen, Xiaopeng Fan

PDF

Open Access

TL;DR

RoamScene3D introduces a novel framework for generating immersive, photorealistic 3D scenes from text by leveraging semantic scene graphs and adaptive camera trajectories, overcoming spatial blindness and inpainting limitations.

Contribution

The paper presents a new approach combining semantic scene graph reasoning with motion-aware inpainting to improve 3D scene generation from text, addressing key limitations of prior methods.

Findings

01

Outperforms state-of-the-art in scene consistency and realism

02

Effectively models object relations for adaptive scene exploration

03

Demonstrates robustness to camera motion and occlusions

Abstract

Generating immersive 3D scenes from texts is a core task in computer vision, crucial for applications in virtual reality and game development. Despite the promise of leveraging 2D diffusion priors, existing methods suffer from spatial blindness and rely on predefined trajectories that fail to exploit the inner relationships among salient objects. Consequently, these approaches are unable to comprehend the semantic layout, preventing them from exploring the scene adaptively to infer occluded content. Moreover, current inpainting models operate in 2D image space, struggling to plausibly fill holes caused by camera motion. To address these limitations, we propose RoamScene3D, a novel framework that bridges the gap between semantic guidance and spatial generation. Our method reasons about the semantic relations among objects and produces consistent and photorealistic scenes. Specifically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications