Chain of Event-Centric Causal Thought for Physically Plausible Video Generation

Zixuan Wang; Yixin Hu; Haolan Wang; Feng Chen; Yan Liu; Wen Li; Yinjie Lei

arXiv:2603.09094·cs.CV·March 31, 2026

Chain of Event-Centric Causal Thought for Physically Plausible Video Generation

Zixuan Wang, Yixin Hu, Haolan Wang, Feng Chen, Yan Liu, Wen Li, Yinjie Lei

PDF

1 Repo

TL;DR

This paper introduces a novel framework for physically plausible video generation by modeling sequences of causally connected events using chain reasoning and cross-modal prompting, improving physical realism.

Contribution

It proposes two modules: physics-driven event chain reasoning with causal constraints and transition-aware cross-modal prompting for continuous, causally consistent video synthesis.

Findings

01

Outperforms existing methods on PhyGenBench and VideoPhy benchmarks.

02

Effectively models causal event sequences with physical constraints.

03

Generates more realistic and physically consistent videos across domains.

Abstract

Physically Plausible Video Generation (PPVG) has emerged as a promising avenue for modeling real-world physical phenomena. PPVG requires an understanding of commonsense knowledge, which remains a challenge for video diffusion models. Current approaches leverage commonsense reasoning capability of large language models to embed physical concepts into prompts. However, generation models often render physical phenomena as a single moment defined by prompts, due to the lack of conditioning mechanisms for modeling causal progression. In this paper, we view PPVG as generating a sequence of causally connected and dynamically evolving events. To realize this paradigm, we design two key modules: (1) Physics-driven Event Chain Reasoning. This module decomposes the physical phenomena described in prompts into multiple elementary event units, leveraging chain-of-thought reasoning. To mitigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZixuanWang0525/CoECT
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.