PAT3D: Physics-Augmented Text-to-3D Scene Generation

Guying Lin; Kemeng Huang; Michael Liu; Ruihan Gao; Hanke Chen; Lyuhao Chen; Beijia Lu; Taku Komura; Yuan Liu; Jun-Yan Zhu; Minchen Li

arXiv:2511.21978·cs.CV·April 24, 2026

PAT3D: Physics-Augmented Text-to-3D Scene Generation

Guying Lin, Kemeng Huang, Michael Liu, Ruihan Gao, Hanke Chen, Lyuhao Chen, Beijia Lu, Taku Komura, Yuan Liu, Jun-Yan Zhu, Minchen Li

PDF

1 Repo 1 Datasets 1 Video

TL;DR

PAT3D is a novel framework that combines vision-language models with physics simulation to generate realistic, physically stable, and intersection-free 3D scenes from text prompts, suitable for downstream applications.

Contribution

It introduces the first physics-augmented text-to-3D scene generation method integrating simulation for enhanced realism and stability, with a novel optimization process.

Findings

01

Outperforms prior methods in physical plausibility and semantic accuracy.

02

Produces simulation-ready 3D scenes suitable for editing and robotics.

03

Uses a differentiable physics engine for scene stability.

Abstract

We introduce PAT3D, the first physics-augmented text-to-3D scene generation framework that integrates vision-language models with physics-based simulation to produce physically plausible, simulation-ready, and intersection-free 3D scenes. Given a text prompt, PAT3D generates 3D objects, infers their spatial relations, and organizes them into a hierarchical scene tree, which is then converted into initial conditions for simulation. A differentiable rigid-body simulator ensures realistic object interactions under gravity, driving the scene toward static equilibrium without interpenetrations. To further enhance scene quality, we introduce a simulation-in-the-loop optimization procedure that guarantees physical stability and non-intersection, while improving semantic consistency with the input prompt. Experiments demonstrate that PAT3D substantially outperforms prior approaches in physical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Simulation-Intelligence/PAT3D
github

Datasets

guyingl/pat3d
dataset· 854 dl
854 dl

Videos

PAT3D: Physics-Augmented Text-to-3D Scene Generation· slideslive