PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance

Cong Wang; Hanxin Zhu; Xiao Tang; Jiayi Luo; Xin Jin; Long Chen; Fei-Yue Wang; Zhibo Chen

arXiv:2603.18639·cs.CV·March 20, 2026

PhysVideo: Physically Plausible Video Generation with Cross-View Geometry Guidance

Cong Wang, Hanxin Zhu, Xiao Tang, Jiayi Luo, Xin Jin, Long Chen, Fei-Yue Wang, Zhibo Chen

PDF

Open Access

TL;DR

PhysVideo introduces a two-stage framework for physically plausible video generation that incorporates physics-aware attention and geometry-guided synthesis, significantly enhancing realism and coherence in generated videos.

Contribution

The paper presents PhysVideo, a novel two-stage approach with physics-aware and geometry-enhanced modules, and introduces PhysMV, a large dataset for training physically consistent video models.

Findings

01

PhysVideo outperforms existing methods in physical realism.

02

The framework achieves higher spatial-temporal coherence.

03

Extensive experiments validate the effectiveness of PhysVideo.

Abstract

Recent progress in video generation has led to substantial improvements in visual fidelity, yet ensuring physically consistent motion remains a fundamental challenge. Intuitively, this limitation can be attributed to the fact that real-world object motion unfolds in three-dimensional space, while video observations provide only partial, view-dependent projections of such dynamics. To address these issues, we propose PhysVideo, a two-stage framework that first generates physics-aware orthogonal foreground videos and then synthesizes full videos with background. In the first stage, Phys4View leverages physics-aware attention to capture the influence of physical attributes on motion dynamics, and enhances spatio-temporal consistency by incorporating geometry-enhanced cross-view attention and temporal attention. In the second stage, VideoSyn uses the generated foreground videos as guidance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · 3D Shape Modeling and Analysis