PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

Zeqing Wang; Keze Wang; Lei Zhang

arXiv:2512.01843·cs.CV·May 19, 2026

PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

Zeqing Wang, Keze Wang, Lei Zhang

PDF

1 Repo

TL;DR

This paper introduces PhyDetEx, a dataset and method for detecting and explaining physical implausibility in Text-to-Video models, revealing current limitations in understanding physics.

Contribution

It presents a new dataset, a fine-tuning approach for vision-language models, and benchmarks T2V models' adherence to physical laws.

Findings

01

Recent T2V models show progress but still struggle with physical plausibility.

02

Fine-tuned VLMs can detect and explain physical violations in generated videos.

03

Open-source models are less capable of understanding physical laws.

Abstract

Driven by the growing capacity and training scale, Text-to-Video (T2V) generation models have recently achieved substantial progress in video quality, length, and instruction-following capability. However, whether these models can understand physics and generate physically plausible videos remains a question. While Vision-Language Models (VLMs) have been widely used as general-purpose evaluators in various applications, they struggle to identify the physically impossible content from generated videos. To investigate this issue, we construct a \textbf{PID} (\textbf{P}hysical \textbf{I}mplausibility \textbf{D}etection) dataset, which consists of a \textit{test split} of 500 manually annotated videos and a \textit{train split} of 2,588 paired videos, where each implausible video is generated by carefully rewriting the caption of its corresponding real-world video to induce T2V models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zeqing-Wang/PhyDetEx
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis