ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models

Martina Miliani; Serena Auriemma; Alessandro Bondielli; Emmanuele Chersoni; Lucia Passaro; Irene Sucameli; Alessandro Lenci

arXiv:2502.15487·cs.CL·February 10, 2026

ExpliCa: Evaluating Explicit Causal Reasoning in Large Language Models

Martina Miliani, Serena Auriemma, Alessandro Bondielli, Emmanuele Chersoni, Lucia Passaro, Irene Sucameli, Alessandro Lenci

PDF

Open Access

TL;DR

ExpliCa introduces a new dataset to evaluate large language models' ability to perform explicit causal reasoning, revealing current models' limitations and their confusion between causal and temporal relations.

Contribution

The paper presents ExpliCa, a novel dataset that assesses LLMs on explicit causal reasoning with diverse linguistic structures and human ratings, highlighting model challenges.

Findings

01

Models struggle to reach 80% accuracy on causal reasoning tasks.

02

Models often confuse temporal relations with causal ones.

03

Model performance varies with linguistic order and size.

Abstract

Large Language Models (LLMs) are increasingly used in tasks requiring interpretive and inferential accuracy. In this paper, we introduce ExpliCa, a new dataset for evaluating LLMs in explicit causal reasoning. ExpliCa uniquely integrates both causal and temporal relations presented in different linguistic orders and explicitly expressed by linguistic connectives. The dataset is enriched with crowdsourced human acceptability ratings. We tested LLMs on ExpliCa through prompting and perplexity-based metrics. We assessed seven commercial and open-source LLMs, revealing that even top models struggle to reach 0.80 accuracy. Interestingly, models tend to confound temporal relations with causal ones, and their performance is also strongly influenced by the linguistic order of the events. Finally, perplexity-based scores and prompting performance are differently affected by model size.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques