Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models
Lukas H\"ollein, Ang Cao, Andrew Owens, Justin Johnson, Matthias, Nie{\ss}ner

TL;DR
Text2Room is a novel method that generates complete, textured 3D room-scale meshes from text prompts by synthesizing images, estimating depth, and fusing views into a seamless 3D scene.
Contribution
It introduces a new approach that creates full 3D scenes with multiple objects and textures from text, combining pre-trained models and a continuous alignment strategy.
Findings
First to generate room-scale textured 3D scenes from text
Produces seamless, multi-object 3D meshes with explicit geometry
Outperforms existing methods in qualitative and quantitative metrics
Abstract
We present Text2Room, a method for generating room-scale textured 3D meshes from a given text prompt as input. To this end, we leverage pre-trained 2D text-to-image models to synthesize a sequence of images from different poses. In order to lift these outputs into a consistent 3D scene representation, we combine monocular depth estimation with a text-conditioned inpainting model. The core idea of our approach is a tailored viewpoint selection such that the content of each image can be fused into a seamless, textured 3D mesh. More specifically, we propose a continuous alignment strategy that iteratively fuses scene frames with the existing geometry to create a seamless mesh. Unlike existing works that focus on generating single objects or zoom-out trajectories from text, our method generates complete 3D scenes with multiple objects and explicit 3D geometry. We evaluate our approach using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Text2Room: Extracting Textured 3D Meshes from 2D Text-to-Image Models· youtube
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Image Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage
MethodsDiffusion · Inpainting
