FlowScene: Style-Consistent Indoor Scene Generation with Multimodal Graph Rectified Flow
Zhifei Yang, Guangyao Zhai, Keyang Lu, YuYang Yin, Chao Zhang, Zhen Xiao, Jieyi Long, Nassir Navab, Yikai Wang

TL;DR
FlowScene is a novel multimodal graph-based generative model that produces high-fidelity, style-coherent indoor scenes with fine-grained control over object geometry and appearance, outperforming existing methods.
Contribution
It introduces a tri-branch model with a rectified flow mechanism for collaborative scene generation, enabling detailed control and style consistency in indoor scene synthesis.
Findings
Outperforms baselines in realism and style coherence
Enables fine-grained control over object attributes
Achieves better alignment with human preferences
Abstract
Scene generation has extensive industrial applications, demanding both high realism and precise control over geometry and appearance. Language-driven retrieval methods compose plausible scenes from a large object database, but overlook object-level control and often fail to enforce scene-level style coherence. Graph-based formulations offer higher controllability over objects and inform holistic consistency by explicitly modeling relations, yet existing methods struggle to produce high-fidelity textured results, thereby limiting their practical utility. We present FlowScene, a tri-branch scene generative model conditioned on multimodal graphs that collaboratively generates scene layouts, object shapes, and object textures. At its core lies a tight-coupled rectified flow model that exchanges object information during generation, enabling collaborative reasoning across the graph. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis
