Diffusion-based Generation, Optimization, and Planning in 3D Scenes
Siyuan Huang, Zan Wang, Puhao Li, Baoxiong Jia, Tengyu Liu, Yixin Zhu,, Wei Liang, Song-Chun Zhu

TL;DR
SceneDiffuser is a novel diffusion-based model that unifies 3D scene generation, optimization, and planning, offering scene-awareness, physics-based reasoning, and goal orientation for diverse 3D understanding tasks.
Contribution
It introduces SceneDiffuser, a fully differentiable, scene-aware diffusion model that integrates generation, optimization, and planning in 3D scenes, surpassing prior methods.
Findings
Significant improvements in human pose and motion generation.
Enhanced dexterous grasp generation.
Superior path and motion planning results.
Abstract
We introduce SceneDiffuser, a conditional generative model for 3D scene understanding. SceneDiffuser provides a unified model for solving scene-conditioned generation, optimization, and planning. In contrast to prior works, SceneDiffuser is intrinsically scene-aware, physics-based, and goal-oriented. With an iterative sampling strategy, SceneDiffuser jointly formulates the scene-aware generation, physics-based optimization, and goal-oriented planning via a diffusion-based denoising process in a fully differentiable fashion. Such a design alleviates the discrepancies among different modules and the posterior collapse of previous scene-conditioned generative models. We evaluate SceneDiffuser with various 3D scene understanding tasks, including human pose and motion generation, dexterous grasp generation, path planning for 3D navigation, and motion planning for robot arms. The results show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging
