Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion

Haoran Lu; Shang Wu; Jianshu Zhang; Maojiang Su; Guo Ye; Chenwei Xu; Lie Lu; Pranav Maneriker; Fan Du; Manling Li; Zhaoran Wang; Han Liu

arXiv:2603.03485·cs.CV·March 9, 2026

Phys4D: Fine-Grained Physics-Consistent 4D Modeling from Video Diffusion

Haoran Lu, Shang Wu, Jianshu Zhang, Maojiang Su, Guo Ye, Chenwei Xu, Lie Lu, Pranav Maneriker, Fan Du, Manling Li, Zhaoran Wang, Han Liu

PDF

Open Access

TL;DR

Phys4D introduces a three-stage training pipeline that enhances video diffusion models with physics-consistent 4D representations, improving physical plausibility and dynamic coherence in generated videos.

Contribution

The paper proposes Phys4D, a novel method combining large-scale pretraining, physics-grounded fine-tuning, and reinforcement learning to achieve fine-grained physical consistency in 4D video modeling.

Findings

01

Significant improvement in physical consistency over baselines.

02

Effective enforcement of temporally coherent 4D dynamics.

03

Maintains high-quality generative performance.

Abstract

Recent video diffusion models have achieved impressive capabilities as large-scale generative world models. However, these models often struggle with fine-grained physical consistency, exhibiting physically implausible dynamics over time. In this work, we present \textbf{Phys4D}, a pipeline for learning physics-consistent 4D world representations from video diffusion models. Phys4D adopts \textbf{a three-stage training paradigm} that progressively lifts appearance-driven video diffusion models into physics-consistent 4D world representations. We first bootstrap robust geometry and motion representations through large-scale pseudo-supervised pretraining, establishing a foundation for 4D scene modeling. We then perform physics-grounded supervised fine-tuning using simulation-generated data, enforcing temporally consistent 4D dynamics. Finally, we apply simulation-grounded reinforcement…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Face recognition and analysis