CELLO: Causal Evaluation of Large Vision-Language Models
Meiqi Chen, Bo Peng, Yan Zhang, Chaochao Lu

TL;DR
This paper introduces CELLO, a comprehensive dataset and evaluation framework for assessing and improving the causal reasoning abilities of large vision-language models, highlighting current limitations and potential enhancements.
Contribution
The paper presents a new dataset, CELLO, with explicit causal graphs and a novel prompting strategy, CELLO-CoT, to evaluate and enhance causal reasoning in LVLMs.
Findings
LVLMs struggle with causal reasoning tasks.
CELLO-CoT improves model performance on causal questions.
Explicit causal graphs aid in understanding model reasoning.
Abstract
Causal reasoning is fundamental to human intelligence and crucial for effective decision-making in real-world environments. Despite recent advancements in large vision-language models (LVLMs), their ability to comprehend causality remains unclear. Previous work typically focuses on commonsense causality between events and/or actions, which is insufficient for applications like embodied agents and lacks the explicitly defined causal graphs required for formal causal reasoning. To overcome these limitations, we introduce a fine-grained and unified definition of causality involving interactions between humans and/or objects. Building on the definition, we construct a novel dataset, CELLO, consisting of 14,094 causal questions across all four levels of causality: discovery, association, intervention, and counterfactual. This dataset surpasses traditional commonsense causality by including…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications
