FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts

Tongyuan Bai; Wangyuanfan Bai; Dong Chen; Tieru Wu; Manyi Li; Rui Ma

arXiv:2506.02781·cs.CV·June 4, 2025

FreeScene: Mixed Graph Diffusion for 3D Scene Synthesis from Free Prompts

Tongyuan Bai, Wangyuanfan Bai, Dong Chen, Tieru Wu, Manyi Li, Rui Ma

PDF

Open Access

TL;DR

FreeScene introduces a unified, user-friendly framework for 3D indoor scene synthesis that combines text and graph controls, utilizing a novel graph diffusion transformer to improve quality and flexibility.

Contribution

The paper presents FreeScene, a novel framework integrating free-form user inputs with a graph diffusion transformer for enhanced controllability and quality in 3D scene synthesis.

Findings

01

Outperforms state-of-the-art methods in quality and controllability.

02

Supports versatile tasks including text-to-scene and scene rearrangement.

03

Provides an efficient, unified approach for 3D scene generation.

Abstract

Controllability plays a crucial role in the practical applications of 3D indoor scene synthesis. Existing works either allow rough language-based control, that is convenient but lacks fine-grained scene customization, or employ graph based control, which offers better controllability but demands considerable knowledge for the cumbersome graph design process. To address these challenges, we present FreeScene, a user-friendly framework that enables both convenient and effective control for indoor scene synthesis.Specifically, FreeScene supports free-form user inputs including text description and/or reference images, allowing users to express versatile design intentions. The user inputs are adequately analyzed and integrated into a graph representation by a VLM-based Graph Designer. We then propose MG-DiT, a Mixed Graph Diffusion Transformer, which performs graph-aware denoising to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Human Motion and Animation · Human Pose and Action Recognition

MethodsAbsolute Position Encodings · Layer Normalization · Byte Pair Encoding · Label Smoothing · Softmax · Dropout · Dense Connections · Transformer · Diffusion