Structured Thinking Matters: Improving LLMs Generalization in Causal Inference Tasks
Wentao Sun, Jo\~ao Paulo Nogueira, and Alonso Silva

TL;DR
This paper introduces a structured knowledge graph approach to improve large language models' ability to distinguish causation from correlation, significantly enhancing their performance on causal inference benchmarks.
Contribution
The paper proposes a novel method that guides LLMs to build structured knowledge graphs, improving causal reasoning beyond traditional prompting techniques.
Findings
F1 score improved from 32.71 to 48.26 on Corr2Cause benchmark
Significant gains in precision and recall observed
Method demonstrates potential for broader causal inference tasks
Abstract
Despite remarkable advances in the field, LLMs remain unreliable in distinguishing causation from correlation. Recent results from the Corr2Cause dataset benchmark reveal that state-of-the-art LLMs -- such as GPT-4 (F1 score: 29.08) -- only marginally outperform random baselines (Random Uniform, F1 score: 20.38), indicating limited capacity of generalization. To tackle this limitation, we propose a novel structured approach: rather than directly answering causal queries, we provide the model with the capability to structure its thinking by guiding the model to build a structured knowledge graph, systematically encoding the provided correlational premises, to answer the causal queries. This intermediate representation significantly enhances the model's causal capabilities. Experiments on the test subset of the Corr2Cause dataset benchmark with Qwen3-32B model (reasoning model) show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Data Quality and Management · AI-based Problem Solving and Planning
