TL;DR
This paper introduces a novel framework for physically plausible video generation by modeling sequences of causally connected events using chain reasoning and cross-modal prompting, improving physical realism.
Contribution
It proposes two modules: physics-driven event chain reasoning with causal constraints and transition-aware cross-modal prompting for continuous, causally consistent video synthesis.
Findings
Outperforms existing methods on PhyGenBench and VideoPhy benchmarks.
Effectively models causal event sequences with physical constraints.
Generates more realistic and physically consistent videos across domains.
Abstract
Physically Plausible Video Generation (PPVG) has emerged as a promising avenue for modeling real-world physical phenomena. PPVG requires an understanding of commonsense knowledge, which remains a challenge for video diffusion models. Current approaches leverage commonsense reasoning capability of large language models to embed physical concepts into prompts. However, generation models often render physical phenomena as a single moment defined by prompts, due to the lack of conditioning mechanisms for modeling causal progression. In this paper, we view PPVG as generating a sequence of causally connected and dynamically evolving events. To realize this paradigm, we design two key modules: (1) Physics-driven Event Chain Reasoning. This module decomposes the physical phenomena described in prompts into multiple elementary event units, leveraging chain-of-thought reasoning. To mitigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
