Do Large Language Models Reason Causally Like Us? Even Better?

Hanna M. Dettki; Brenden M. Lake; Charley M. Wu; Bob Rehder

arXiv:2502.10215·cs.AI·June 9, 2025·2 cites

Do Large Language Models Reason Causally Like Us? Even Better?

Hanna M. Dettki, Brenden M. Lake, Charley M. Wu, Bob Rehder

PDF

Open Access

TL;DR

This study compares causal reasoning in humans and large language models, revealing that some models perform comparably or better than humans and lack certain biases, but still miss complex reasoning patterns.

Contribution

It provides a systematic comparison of LLMs' causal reasoning abilities against humans, highlighting their strengths and limitations in understanding collider graph patterns.

Findings

01

GPT-4o, Gemini-Pro, and Claude outperform GPT-3.5 in causal reasoning

02

Some LLMs lack associative bias present in humans

03

Models still struggle with subtle collider graph reasoning patterns

Abstract

Causal reasoning is a core component of intelligence. Large language models (LLMs) have shown impressive capabilities in generating human-like text, raising questions about whether their responses reflect true understanding or statistical patterns. We compared causal reasoning in humans and four LLMs using tasks based on collider graphs, rating the likelihood of a query variable occurring given evidence from other variables. LLMs' causal inferences ranged from often nonsensical (GPT-3.5) to human-like to often more normatively aligned than those of humans (GPT-4o, Gemini-Pro, and Claude). Computational model fitting showed that one reason for GPT-4o, Gemini-Pro, and Claude's superior performance is they didn't exhibit the "associative bias" that plagues human causal reasoning. Nevertheless, even these LLMs did not fully capture subtler reasoning patterns associated with collider graphs,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · {Dispute@FaQ-s}How to file a dispute with Expedia?