TL;DR
This paper introduces Code-as-Room, an innovative framework that generates detailed 3D indoor room models from top-down images using Blender code, improving spatial accuracy and stability over prior methods.
Contribution
The paper presents a novel agentic framework with a structured execution harness for 3D room synthesis from images, including a new benchmark for evaluation.
Findings
Outperforms existing agent-based methods in 3D room synthesis accuracy.
Effectively captures spatial relationships and scene elements from top-down images.
Demonstrates stability and reduced looping issues in holistic room generation.
Abstract
Designing realistic and functional 3D indoor rooms is essential for a wide range of applications, including interior design, virtual reality, gaming, and embodied AI. While recent MLLM-based approaches have shown great potential for 3D room synthesis from textual descriptions or reference images, text-based methods struggle to capture precise spatial information, and existing image-conditioned agents suffer from instability and infinite looping when tasked with holistic room generation from top-down views. To address these limitations, we propose Code-as-Room, an MLLM-based agentic framework equipped with a structured execution harness, which represents 3D rooms with Blender codes. Given a top-down room image, the framework parses the reference image to extract scene elements and their spatial relationships, and synthesizes executable Blender code for geometry, materials, and lighting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
