PhysAlign: Physics-Coherent Image-to-Video Generation through Feature and 3D Representation Alignment
Zhexiao Xiong, Yizhi Song, Liu He, Wei Xiong, Yu Yuan, Feng Qiao, Nathan Jacobs

TL;DR
PhysAlign introduces a physics-coherent image-to-video generation framework that uses synthetic data and explicit 3D constraints to produce temporally stable videos aligned with physical laws.
Contribution
It presents a novel physics-grounded approach for video generation, utilizing a synthetic dataset and a unified physical latent space to improve temporal coherence.
Findings
Outperforms existing models in physical reasoning tasks
Achieves higher temporal stability without sacrificing visual quality
Bridges the gap between visual synthesis and physical kinematics
Abstract
Video Diffusion Models (VDMs) offer a promising approach for simulating dynamic scenes and environments, with broad applications in robotics and media generation. However, existing models often generate temporally incoherent content that violates basic physical intuition, significantly limiting their practical applicability. We propose PhysAlign, an efficient framework for physics-coherent image-to-video (I2V) generation that explicitly addresses this limitation. To overcome the critical scarcity of physics-annotated videos, we first construct a fully controllable synthetic data generation pipeline based on rigid-body simulation, yielding a highly-curated dataset with accurate, fine-grained physics and 3D annotations. Leveraging this data, PhysAlign constructs a unified physical latent space by coupling explicit 3D geometry constraints with a Gram-based spatio-temporal relational…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · 3D Shape Modeling and Analysis
