HOCA-Bench: Beyond Semantic Perception to Predictive World Modeling via Hegelian Ontological-Causal Anomalies

Chang Liu; Yunfan Ye; Qingyang Zhou; Xichen Tan; Mengxuan Luo; Zhenyu Qiu; Wei Peng; Zhiping Cai

arXiv:2602.19571·cs.CV·February 24, 2026

HOCA-Bench: Beyond Semantic Perception to Predictive World Modeling via Hegelian Ontological-Causal Anomalies

Chang Liu, Yunfan Ye, Qingyang Zhou, Xichen Tan, Mengxuan Luo, Zhenyu Qiu, Wei Peng, Zhiping Cai

PDF

Open Access

TL;DR

HOCA-Bench introduces a new benchmark for evaluating video-language models on their ability to understand and predict physical anomalies, highlighting current models' limitations in causal reasoning about physical laws.

Contribution

This work presents HOCA-Bench, a novel benchmark that categorizes physical anomalies and evaluates Video-LLMs' reasoning on them, revealing gaps in causal physical understanding.

Findings

01

Models excel at static ontological violations but struggle with causal anomalies.

02

Performance drops over 20% on causal reasoning tasks.

03

Current models recognize patterns but lack understanding of physical laws.

Abstract

Video-LLMs have improved steadily on semantic perception, but they still fall short on predictive world modeling, which is central to physically grounded intelligence. We introduce HOCA-Bench, a benchmark that frames physical anomalies through a Hegelian lens. HOCA-Bench separates anomalies into two types: ontological anomalies, where an entity violates its own definition or persistence, and causal anomalies, where interactions violate physical relations. Using state-of-the-art generative video models as adversarial simulators, we build a testbed of 1,439 videos (3,470 QA pairs). Evaluations on 17 Video-LLMs show a clear cognitive lag: models often identify static ontological violations (e.g., shape mutations) but struggle with causal mechanisms (e.g., gravity or friction), with performance dropping by more than 20% on causal tasks. System-2 "Thinking" modes improve reasoning, but they…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Embodied and Extended Cognition · Multimodal Machine Learning Applications