Steerable Scene Generation with Post Training and Inference-Time Search

Nicholas Pfaff; Hongkai Dai; Sergey Zakharov; Shun Iwase; Russ Tedrake

arXiv:2505.04831·cs.RO·August 27, 2025

Steerable Scene Generation with Post Training and Inference-Time Search

Nicholas Pfaff, Hongkai Dai, Sergey Zakharov, Shun Iwase, Russ Tedrake

PDF

Open Access 1 Repo

TL;DR

This paper presents a flexible, diffusion-based scene generation method that can be steered towards specific robotic tasks through post-training and inference-time search, enabling realistic, goal-oriented scene synthesis.

Contribution

It introduces a unified diffusion model for scene generation that can be adapted with reinforcement learning, conditional generation, and a novel MCTS-based inference strategy.

Findings

01

Generated over 44 million diverse, physically feasible scenes.

02

Achieved goal-directed scene synthesis across multiple environments.

03

Demonstrated effective steering of scene generation towards task-specific objectives.

Abstract

Training robots in simulation requires diverse 3D scenes that reflect the specific challenges of downstream tasks. However, scenes that satisfy strict task requirements, such as high-clutter environments with plausible spatial arrangement, are rare and costly to curate manually. Instead, we generate large-scale scene data using procedural models that approximate realistic environments for robotic manipulation, and adapt it to task-specific goals. We do this by training a unified diffusion-based generative model that predicts which objects to place from a fixed asset library, along with their SE(3) poses. This model serves as a flexible scene prior that can be adapted using reinforcement learning-based post training, conditional generation, or inference-time search, steering generation toward downstream objectives even when they differ from the original data distribution. Our method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nepfaff/steerable-scene-generation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Robot Manipulation and Learning · 3D Shape Modeling and Analysis

MethodsDiffusion