LLM Cannot Discover Causality, and Should Be Restricted to Non-Decisional Support in Causal Discovery
Xingyu Wu, Kui Yu, Jibin Wu, Kay Chen Tan

TL;DR
This paper argues that large language models (LLMs) are unsuitable for causal discovery due to their correlation-based reasoning, and should only serve as auxiliary tools rather than decision-makers in causal inference tasks.
Contribution
It provides empirical evidence of LLMs' limitations in causal reasoning and proposes restricting their role to non-decisional support, improving causal structure learning efficiency.
Findings
LLMs lack the theoretical basis for causal reasoning.
Prompt engineering can overstate LLM performance.
LLM-guided heuristic search accelerates causal discovery.
Abstract
This paper critically re-evaluates LLMs' role in causal discovery and argues against their direct involvement in determining causal relationships. We demonstrate that LLMs' autoregressive, correlation-driven modeling inherently lacks the theoretical grounding for causal reasoning and introduces unreliability when used as priors in causal discovery algorithms. Through empirical studies, we expose the limitations of existing LLM-based methods and reveal that deliberate prompt engineering (e.g., injecting ground-truth knowledge) could overstate their performance, helping to explain the consistently favorable results reported in much of the current literature. Based on these findings, we strictly confined LLMs' role to a non-decisional auxiliary capacity: LLMs should not participate in determining the existence or directionality of causal relationships, but can assist the search process for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge · Bayesian Modeling and Causal Inference
