SceneTeller: Language-to-3D Scene Generation
Ba\c{s}ak Melis \"Ocal, Maxim Tatarchenko, Sezer Karaoglu, Theo Gevers

TL;DR
SceneTeller introduces a novel text-to-3D scene generation system that enables users to create and modify high-quality indoor 3D scenes through natural language prompts, making 3D design accessible to non-experts.
Contribution
It pioneers a text-based 3D room design approach combining in-context learning, CAD retrieval, and stylization, advancing democratization of 3D scene creation.
Findings
Produces state-of-the-art 3D scenes from text prompts
Allows easy scene modification with additional text prompts
Accessible to users without professional 3D design skills
Abstract
Designing high-quality indoor 3D scenes is important in many practical applications, such as room planning or game development. Conventionally, this has been a time-consuming process which requires both artistic skill and familiarity with professional software, making it hardly accessible for layman users. However, recent advances in generative AI have established solid foundation for democratizing 3D design. In this paper, we propose a pioneering approach for text-based 3D room design. Given a prompt in natural language describing the object placement in the room, our method produces a high-quality 3D scene corresponding to it. With an additional text prompt the users can change the appearance of the entire scene or of individual objects in it. Built using in-context learning, CAD model retrieval and 3D-Gaussian-Splatting-based stylization, our turnkey pipeline produces…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Multimodal Machine Learning Applications · 3D Modeling in Geospatial Applications
