Causal Reasoning and Large Language Models: Opening a New Frontier for Causality
Emre K{\i}c{\i}man, Robert Ness, Amit Sharma, Chenhao Tan

TL;DR
This paper benchmarks large language models' ability to generate causal reasoning, showing they outperform existing methods in various tasks and can assist human experts in causal analysis, despite some unpredictable failures.
Contribution
It demonstrates that LLMs can effectively perform causal reasoning tasks, surpassing prior algorithms, and explores their potential to aid in causal analysis across domains.
Findings
LLMs outperform existing algorithms in causal discovery and reasoning tasks
LLMs generalize well to datasets created after training cutoff
LLMs exhibit some unpredictable failure modes
Abstract
The causal capabilities of large language models (LLMs) are a matter of significant debate, with critical implications for the use of LLMs in societally impactful domains such as medicine, science, law, and policy. We conduct a "behavorial" study of LLMs to benchmark their capability in generating causal arguments. Across a wide range of tasks, we find that LLMs can generate text corresponding to correct causal arguments with high probability, surpassing the best-performing existing methods. Algorithms based on GPT-3.5 and 4 outperform existing algorithms on a pairwise causal discovery task (97%, 13 points gain), counterfactual reasoning task (92%, 20 points gain) and event causality (86% accuracy in determining necessary and sufficient causes in vignettes). We perform robustness checks across tasks and show that the capabilities cannot be explained by dataset memorization alone,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
MethodsMulti-Head Attention · Attention Is All You Need · Cosine Annealing · Adam · Layer Normalization · Linear Layer · Dropout · Byte Pair Encoding · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia?
