Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics

Ying Shen; Jerry Xiong; Tianjiao Yu; Ismini Lourentzou

arXiv:2604.08503·cs.CV·May 20, 2026

Phantom: Physics-Infused Video Generation via Joint Modeling of Visual and Latent Physical Dynamics

Ying Shen, Jerry Xiong, Tianjiao Yu, Ismini Lourentzou

PDF

TL;DR

Phantom is a novel video generation model that integrates physical dynamics inference with visual content prediction to produce more realistic and physically consistent videos.

Contribution

It introduces a physics-aware representation and joint modeling approach that enhances physical plausibility in generated videos without complex physical property specifications.

Findings

01

Outperforms existing methods in physical consistency.

02

Produces videos with higher perceptual fidelity.

03

Demonstrates effectiveness on standard and physics-aware benchmarks.

Abstract

Recent advances in generative video modeling, driven by large-scale datasets and powerful architectures, have yielded remarkable visual realism. However, emerging evidence suggests that simply scaling data and model size does not endow these systems with an understanding of the underlying physical laws that govern real-world dynamics. Existing approaches often fail to capture or enforce such physical consistency, resulting in unrealistic motion and dynamics. In his work, we investigate whether integrating the inference of latent physical properties directly into the video generation process can equip models with the ability to produce physically plausible videos. To this end, we propose Phantom, a Physics-Infused Video Generation model that jointly models the visual content and latent physical dynamics. Conditioned on observed video frames and inferred physical states, Phantom jointly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.