SceneLCM: End-to-End Layout-Guided Interactive Indoor Scene Generation with Latent Consistency Model

Yangkai Lin; Jiabao Lei; Kui Jia

arXiv:2506.07091·cs.CV·June 10, 2025

SceneLCM: End-to-End Layout-Guided Interactive Indoor Scene Generation with Latent Consistency Model

Yangkai Lin, Jiabao Lei, Kui Jia

PDF

Open Access

TL;DR

SceneLCM is an innovative end-to-end framework that combines large language models and latent consistency models to generate, optimize, and physically edit complex indoor scenes based on user prompts, addressing previous limitations.

Contribution

It introduces a modular pipeline integrating LLM-guided layout design, CTS-based scene optimization, and physical editing, with theoretical justifications for the sampling loss.

Findings

01

Outperforms state-of-the-art scene generation methods.

02

Produces high-quality, physically realistic indoor scenes.

03

Enables flexible editing and multi-room scene synthesis.

Abstract

Our project page: https://scutyklin.github.io/SceneLCM/. Automated generation of complex, interactive indoor scenes tailored to user prompt remains a formidable challenge. While existing methods achieve indoor scene synthesis, they struggle with rigid editing constraints, physical incoherence, excessive human effort, single-room limitations, and suboptimal material quality. To address these limitations, we propose SceneLCM, an end-to-end framework that synergizes Large Language Model (LLM) for layout design with Latent Consistency Model(LCM) for scene optimization. Our approach decomposes scene generation into four modular pipelines: (1) Layout Generation. We employ LLM-guided 3D spatial reasoning to convert textual descriptions into parametric blueprints(3D layout). And an iterative programmatic validation mechanism iteratively refines layout parameters through LLM-mediated dialogue…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications