SceneLCM: End-to-End Layout-Guided Interactive Indoor Scene Generation with Latent Consistency Model
Yangkai Lin, Jiabao Lei, Kui Jia

TL;DR
SceneLCM is an innovative end-to-end framework that combines large language models and latent consistency models to generate, optimize, and physically edit complex indoor scenes based on user prompts, addressing previous limitations.
Contribution
It introduces a modular pipeline integrating LLM-guided layout design, CTS-based scene optimization, and physical editing, with theoretical justifications for the sampling loss.
Findings
Outperforms state-of-the-art scene generation methods.
Produces high-quality, physically realistic indoor scenes.
Enables flexible editing and multi-room scene synthesis.
Abstract
Our project page: https://scutyklin.github.io/SceneLCM/. Automated generation of complex, interactive indoor scenes tailored to user prompt remains a formidable challenge. While existing methods achieve indoor scene synthesis, they struggle with rigid editing constraints, physical incoherence, excessive human effort, single-room limitations, and suboptimal material quality. To address these limitations, we propose SceneLCM, an end-to-end framework that synergizes Large Language Model (LLM) for layout design with Latent Consistency Model(LCM) for scene optimization. Our approach decomposes scene generation into four modular pipelines: (1) Layout Generation. We employ LLM-guided 3D spatial reasoning to convert textual descriptions into parametric blueprints(3D layout). And an iterative programmatic validation mechanism iteratively refines layout parameters through LLM-mediated dialogue…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications
