DiReCT: Disentangled Regularization of Contrastive Trajectories for Physics-Refined Video Generation

Abolfazl Meyarian; Amin Karimi Monsefi; Rajiv Ramnath; Ser-Nam Lim

arXiv:2603.25931·cs.CV·March 30, 2026

DiReCT: Disentangled Regularization of Contrastive Trajectories for Physics-Refined Video Generation

Abolfazl Meyarian, Amin Karimi Monsefi, Rajiv Ramnath, Ser-Nam Lim

PDF

TL;DR

DiReCT introduces a novel regularization framework that disentangles semantic and physical information in contrastive video generation, enhancing physical realism without extra training cost.

Contribution

The paper proposes DiReCT, a post-training method that separates semantic and physical signals in contrastive learning for physics-aware video generation.

Findings

01

Improves physical commonsense score on VideoPhy by 16.7% over baseline.

02

Effectively separates semantic and physical information in contrastive trajectories.

03

Enhances physics consistency in generated videos without increasing training time.

Abstract

Flow-matching video generators produce temporally coherent, high-fidelity outputs yet routinely violate elementary physics because their reconstruction objectives penalize per-frame deviations without distinguishing physically consistent dynamics from impossible ones. Contrastive flow matching offers a principled remedy by pushing apart velocity-field trajectories of differing conditions, but we identify a fundamental obstacle in the text-conditioned video setting: semantic-physics entanglement. Because natural-language prompts couple scene content with physical behavior, naive negative sampling draws conditions whose velocity fields largely overlap with the positive sample's, causing the contrastive gradient to directly oppose the flow-matching objective. We formalize this gradient conflict, deriving a precise alignment condition that reveals when contrastive learning helps versus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.