TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models

Zeqing Wang; Shiyuan Zhang; Chengpei Tang; Keze Wang

arXiv:2505.15435·cs.CV·May 22, 2025

TimeCausality: Evaluating the Causal Ability in Time Dimension for Vision Language Models

Zeqing Wang, Shiyuan Zhang, Chengpei Tang, Keze Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces TimeCausality, a benchmark to evaluate vision-language models' ability to understand and reason about temporal causality, revealing current models' limitations in this aspect.

Contribution

The paper presents a new benchmark, TimeCausality, specifically designed to assess temporal causal reasoning in vision-language models, highlighting the gap in current models' capabilities.

Findings

01

Current SOTA open-source VLMs perform poorly on TimeCausality.

02

GPT-4o shows a performance drop on TimeCausality compared to other tasks.

03

There is a significant gap between open-source and closed-source models in temporal causal reasoning.

Abstract

Reasoning about temporal causality, particularly irreversible transformations of objects governed by real-world knowledge (e.g., fruit decay and human aging), is a fundamental aspect of human visual understanding. Unlike temporal perception based on simple event sequences, this form of reasoning requires a deeper comprehension of how object states change over time. Although the current powerful Vision-Language Models (VLMs) have demonstrated impressive performance on a wide range of downstream tasks, their capacity to reason about temporal causality remains underexplored. To address this gap, we introduce \textbf{TimeCausality}, a novel benchmark specifically designed to evaluate the causal reasoning ability of VLMs in the temporal dimension. Based on our TimeCausality, we find that while the current SOTA open-source VLMs have achieved performance levels comparable to closed-source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zeqing-wang/timecausality
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech and dialogue systems · Natural Language Processing Techniques