Multi-turn Physics-informed Vision-language Model for Physics-grounded Anomaly Detection
Yao Gu, Xiaohao Xu, and Yingna Wu

TL;DR
This paper presents a physics-informed instruction tuning framework for vision-language models, significantly improving their ability to detect physics-grounded anomalies in videos by incorporating dynamic constraints and causal reasoning.
Contribution
The authors introduce a novel physics-informed instruction tuning method that encodes physical priors into structured prompts, enhancing anomaly detection and causal explanation capabilities of vision-language models.
Findings
Achieves 96.7% AUROC on Phys-AD benchmark, outperforming previous SOTA (66.9%)
Enables robust causal reasoning and explanations of dynamic anomalies
Demonstrates the effectiveness of structured physics priors in vision-language models
Abstract
Vision-Language Models (VLMs) demonstrate strong general-purpose reasoning but remain limited in physics-grounded anomaly detection, where causal understanding of dynamics is essential. Existing VLMs, trained predominantly on appearance-centric correlations, fail to capture kinematic constraints, leading to poor performance on anomalies such as irregular rotations or violated mechanical motions. We introduce a physics-informed instruction tuning framework that explicitly encodes object properties, motion paradigms, and dynamic constraints into structured prompts. By delivering these physical priors through multi-turn dialogues, our method decomposes causal reasoning into incremental steps, enabling robust internal representations of normal and abnormal dynamics. Evaluated on the Phys-AD benchmark, our approach achieves 96.7% AUROC in video-level detection--substantially outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Multimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI)
