CauScientist: Teaching LLMs to Respect Data for Causal Discovery

Bo Peng; Sirui Chen; Lei Xu; Chaochao Lu

arXiv:2601.13614·cs.CL·January 21, 2026

CauScientist: Teaching LLMs to Respect Data for Causal Discovery

Bo Peng, Sirui Chen, Lei Xu, Chaochao Lu

PDF

Open Access

TL;DR

CauScientist introduces a collaborative framework combining large language models with statistical verification to improve causal discovery accuracy and robustness over existing data-driven methods.

Contribution

The paper presents CauScientist, a novel hybrid approach that leverages LLMs for hypothesis generation and statistical methods for validation, significantly enhancing causal discovery performance.

Findings

01

Up to 53.8% F1 score improvement over baselines

02

Recall increased from 35.0% to 100.0%

03

44.0% reduction in structural hamming distance on complex graphs

Abstract

Causal discovery is fundamental to scientific understanding and reliable decision-making. Existing approaches face critical limitations: purely data-driven methods suffer from statistical indistinguishability and modeling assumptions, while recent LLM-based methods either ignore statistical evidence or incorporate unverified priors that can mislead result. To this end, we propose CauScientist, a collaborative framework that synergizes LLMs as hypothesis-generating "data scientists" with probabilistic statistics as rigorous "verifiers". CauScientist employs hybrid initialization to select superior starting graphs, iteratively refines structures through LLM-proposed modifications validated by statistical criteria, and maintains error memory to guide efficient search space. Experiments demonstrate that CauScientist substantially outperforms purely data-driven baselines, achieving up to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Advanced Graph Neural Networks · Explainable Artificial Intelligence (XAI)