PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection

Peiyao Wang; Weining Wang; Qi Li

arXiv:2511.03997·cs.CV·November 7, 2025

PhysCorr: Dual-Reward DPO for Physics-Constrained Text-to-Video Generation with Automated Preference Selection

Peiyao Wang, Weining Wang, Qi Li

PDF

Open Access

TL;DR

PhysCorr introduces a physics-aware framework for text-to-video generation that enhances physical realism by modeling, evaluating, and optimizing physical consistency, addressing a key challenge in deploying such models in real-world applications.

Contribution

The paper presents PhysCorr, a novel unified framework with PhysicsRM and PhyDPO for modeling and optimizing physical plausibility in video generation, a significant advancement over prior purely perceptual approaches.

Findings

01

PhysCorr improves physical realism in generated videos.

02

The framework maintains high visual fidelity and semantic accuracy.

03

It is compatible with various video generation models.

Abstract

Recent advances in text-to-video generation have achieved impressive perceptual quality, yet generated content often violates fundamental principles of physical plausibility - manifesting as implausible object dynamics, incoherent interactions, and unrealistic motion patterns. Such failures hinder the deployment of video generation models in embodied AI, robotics, and simulation-intensive domains. To bridge this gap, we propose PhysCorr, a unified framework for modeling, evaluating, and optimizing physical consistency in video generation. Specifically, we introduce PhysicsRM, the first dual-dimensional reward model that quantifies both intra-object stability and inter-object interactions. On this foundation, we develop PhyDPO, a novel direct preference optimization pipeline that leverages contrastive feedback and physics-aware reweighting to guide generation toward physically coherent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Multimodal Machine Learning Applications