HOCA-Bench: Beyond Semantic Perception to Predictive World Modeling via Hegelian Ontological-Causal Anomalies
Chang Liu, Yunfan Ye, Qingyang Zhou, Xichen Tan, Mengxuan Luo, Zhenyu Qiu, Wei Peng, Zhiping Cai

TL;DR
HOCA-Bench introduces a new benchmark for evaluating video-language models on their ability to understand and predict physical anomalies, highlighting current models' limitations in causal reasoning about physical laws.
Contribution
This work presents HOCA-Bench, a novel benchmark that categorizes physical anomalies and evaluates Video-LLMs' reasoning on them, revealing gaps in causal physical understanding.
Findings
Models excel at static ontological violations but struggle with causal anomalies.
Performance drops over 20% on causal reasoning tasks.
Current models recognize patterns but lack understanding of physical laws.
Abstract
Video-LLMs have improved steadily on semantic perception, but they still fall short on predictive world modeling, which is central to physically grounded intelligence. We introduce HOCA-Bench, a benchmark that frames physical anomalies through a Hegelian lens. HOCA-Bench separates anomalies into two types: ontological anomalies, where an entity violates its own definition or persistence, and causal anomalies, where interactions violate physical relations. Using state-of-the-art generative video models as adversarial simulators, we build a testbed of 1,439 videos (3,470 QA pairs). Evaluations on 17 Video-LLMs show a clear cognitive lag: models often identify static ontological violations (e.g., shape mutations) but struggle with causal mechanisms (e.g., gravity or friction), with performance dropping by more than 20% on causal tasks. System-2 "Thinking" modes improve reasoning, but they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Embodied and Extended Cognition · Multimodal Machine Learning Applications
