Can Large Language Models Infer Causal Relationships from Real-World Text?

Ryan Saklad; Aman Chadha; Oleg Pavlov; Raha Moraffah

arXiv:2505.18931·cs.AI·April 14, 2026

Can Large Language Models Infer Causal Relationships from Real-World Text?

Ryan Saklad, Aman Chadha, Oleg Pavlov, Raha Moraffah

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces a novel benchmark dataset from real-world academic texts to evaluate large language models' ability to infer causal relationships, revealing significant challenges and guiding future research.

Contribution

It presents the first real-world dataset for causal inference from texts and analyzes LLM performance across diverse, complex real-world scenarios.

Findings

01

LLMs achieve an average F1 score of 0.535 on the benchmark.

02

Performance varies with explicitness, number of causal relations, text length, and domain.

03

The benchmark provides targeted insights for improving LLM causal reasoning.

Abstract

Understanding and inferring causal relationships from texts is a core aspect of human cognition and is essential for advancing large language models (LLMs) towards artificial general intelligence. Existing work evaluating LLM causal reasoning primarily relies on synthetic or simplified texts with explicitly stated causal relationships. These texts typically feature short passages and few causal relations, failing to reflect the complexities of real-world reasoning. In this paper, we investigate whether LLMs are capable of inferring causal relationships from real-world texts. We develop a benchmark drawn from real-world academic literature, which includes diverse texts with respect to length, complexity (different levels of explicitness, number of causal events and relationships), and domain. To the best of our knowledge, our benchmark is the first-ever real-world dataset for this task.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Ryan-Saklad/ReCITE
github

Datasets

RyanSaklad/ReCITE
dataset· 156 dl
156 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.