PAT3D: Physics-Augmented Text-to-3D Scene Generation
Guying Lin, Kemeng Huang, Michael Liu, Ruihan Gao, Hanke Chen, Lyuhao Chen, Beijia Lu, Taku Komura, Yuan Liu, Jun-Yan Zhu, Minchen Li

TL;DR
PAT3D is a novel framework that combines vision-language models with physics simulation to generate realistic, physically stable, and intersection-free 3D scenes from text prompts, suitable for downstream applications.
Contribution
It introduces the first physics-augmented text-to-3D scene generation method integrating simulation for enhanced realism and stability, with a novel optimization process.
Findings
Outperforms prior methods in physical plausibility and semantic accuracy.
Produces simulation-ready 3D scenes suitable for editing and robotics.
Uses a differentiable physics engine for scene stability.
Abstract
We introduce PAT3D, the first physics-augmented text-to-3D scene generation framework that integrates vision-language models with physics-based simulation to produce physically plausible, simulation-ready, and intersection-free 3D scenes. Given a text prompt, PAT3D generates 3D objects, infers their spatial relations, and organizes them into a hierarchical scene tree, which is then converted into initial conditions for simulation. A differentiable rigid-body simulator ensures realistic object interactions under gravity, driving the scene toward static equilibrium without interpenetrations. To further enhance scene quality, we introduce a simulation-in-the-loop optimization procedure that guarantees physical stability and non-intersection, while improving semantic consistency with the input prompt. Experiments demonstrate that PAT3D substantially outperforms prior approaches in physical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
