SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes

Nicholas Pfaff; Thomas Cohn; Sergey Zakharov; Rick Cory; Russ Tedrake

arXiv:2602.09153·cs.RO·February 12, 2026

SceneSmith: Agentic Generation of Simulation-Ready Indoor Scenes

Nicholas Pfaff, Thomas Cohn, Sergey Zakharov, Rick Cory, Russ Tedrake

PDF

Open Access 1 Datasets

TL;DR

SceneSmith is a hierarchical framework that generates diverse, physically realistic indoor scenes from natural language prompts, significantly improving scene complexity and realism for robotic simulation.

Contribution

It introduces a novel agentic, multi-stage approach combining text-to-3D synthesis, dataset retrieval, and physical estimation to create detailed, simulation-ready indoor environments.

Findings

01

Generates 3-6x more objects than prior methods

02

Achieves less than 2% inter-object collisions

03

96% of objects remain stable under physics simulation

Abstract

Simulation has become a key tool for training and evaluating home robots at scale, yet existing environments fail to capture the diversity and physical complexity of real indoor spaces. Current scene synthesis methods produce sparsely furnished rooms that lack the dense clutter, articulated furniture, and physical properties essential for robotic manipulation. We introduce SceneSmith, a hierarchical agentic framework that generates simulation-ready indoor environments from natural language prompts. SceneSmith constructs scenes through successive stages $\unicode x 2013$ from architectural layout to furniture placement to small object population $\unicode x 2013$ each implemented as an interaction among VLM agents: designer, critic, and orchestrator. The framework tightly integrates asset generation through text-to-3D synthesis for static objects, dataset retrieval for articulated objects,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

nepfaff/scenesmith-example-scenes
dataset· 5.3k dl
5.3k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Robot Manipulation and Learning · Social Robot Interaction and HRI