FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts
Tongyuan Bai, Wangyuanfan Bai, Dong Chen, Tieru Wu, Manyi Li, Rui Ma

TL;DR
FreeScene introduces a unified, user-friendly framework for 3D indoor scene synthesis that combines text and graph controls, utilizing a novel graph diffusion transformer to improve quality and flexibility.
Contribution
The paper presents FreeScene, a novel framework integrating free-form user inputs with a graph diffusion transformer for enhanced controllability and quality in 3D scene synthesis.
Findings
Outperforms state-of-the-art methods in quality and controllability.
Supports versatile tasks including text-to-scene and scene rearrangement.
Provides an efficient, unified approach for 3D scene generation.
Abstract
Controllability plays a crucial role in the practical applications of 3D indoor scene synthesis. Existing works either allow rough language-based control, that is convenient but lacks fine-grained scene customization, or employ graph based control, which offers better controllability but demands considerable knowledge for the cumbersome graph design process. To address these challenges, we present FreeScene, a user-friendly framework that enables both convenient and effective control for indoor scene synthesis.Specifically, FreeScene supports free-form user inputs including text description and/or reference images, allowing users to express versatile design intentions. The user inputs are adequately analyzed and integrated into a graph representation by a VLM-based Graph Designer. We then propose MG-DiT, a Mixed Graph Diffusion Transformer, which performs graph-aware denoising to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Human Motion and Animation · Human Pose and Action Recognition
MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · Diffusion
