Steerable Scene Generation with Post Training and Inference-Time Search
Nicholas Pfaff, Hongkai Dai, Sergey Zakharov, Shun Iwase, Russ Tedrake

TL;DR
This paper presents a flexible, diffusion-based scene generation method that can be steered towards specific robotic tasks through post-training and inference-time search, enabling realistic, goal-oriented scene synthesis.
Contribution
It introduces a unified diffusion model for scene generation that can be adapted with reinforcement learning, conditional generation, and a novel MCTS-based inference strategy.
Findings
Generated over 44 million diverse, physically feasible scenes.
Achieved goal-directed scene synthesis across multiple environments.
Demonstrated effective steering of scene generation towards task-specific objectives.
Abstract
Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement, are rare and costly to curate manually. Instead, we generate large-scale scene data using procedural models that approximate realistic environments for robotic manipulation, and adapt it to task-specific goals. We do this by training a unified diffusion-based generative model that predicts which objects to place from a fixed asset library, along with their SE(3) poses. This model serves as a flexible scene prior that can be adapted using reinforcement learning-based post training, conditional generation, or inference-time search, steering generation toward downstream objectives even when they differ from the original data distribution. Our method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning · 3D Shape Modeling and Analysis
MethodsDiffusion
