Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting
Yian Wang, Xiaowen Qiu, Jiageng Liu, Zhehuan Chen, Jiting Cai, Yufei, Wang, Tsun-Hsuan Wang, Zhou Xian, Chuang Gan

TL;DR
Architect introduces a novel framework that uses hierarchical 2D inpainting with diffusion models and depth estimation to generate detailed, realistic 3D environments for robotics and AI applications, reducing manual effort and improving spatial reasoning.
Contribution
The paper presents a new generative approach combining 2D diffusion-based inpainting and 3D lifting for creating complex 3D scenes from various inputs.
Findings
Enables flexible scene generation from text, floor plans, or existing environments.
Produces realistic 3D scenes with detailed furniture and objects.
Iterative inpainting refines scene composition effectively.
Abstract
Creating large-scale interactive 3D environments is essential for the development of Robotics and Embodied AI research. Current methods, including manual design, procedural generation, diffusion-based scene generation, and large language model (LLM) guided scene design, are hindered by limitations such as excessive human effort, reliance on predefined rules or training datasets, and limited 3D spatial reasoning ability. Since pre-trained 2D image generative models better capture scene and object configuration than LLMs, we address these challenges by introducing Architect, a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting. In detail, we utilize foundation visual perception models to obtain each generated object from the image and leverage pre-trained depth estimation models to lift the generated 2D image to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Human Motion and Animation
MethodsInpainting
