Can Post-Training Transform LLMs into Causal Reasoners?

Junqi Chen; Sirui Chen; Chaochao Lu

arXiv:2602.06337·cs.CL·February 9, 2026

Can Post-Training Transform LLMs into Causal Reasoners?

Junqi Chen, Sirui Chen, Chaochao Lu

PDF

Open Access 1 Models

TL;DR

This paper demonstrates that targeted post-training significantly enhances large language models' ability to perform causal inference, achieving high accuracy and robustness across various benchmarks and real-world scenarios.

Contribution

It introduces CauGym, a new dataset for causal tasks, and systematically evaluates post-training methods, showing their effectiveness in improving LLM causal reasoning.

Findings

01

Smaller LLMs can outperform larger models with proper post-training.

02

Achieved 93.5% accuracy on CaLM benchmark with a 14B parameter model.

03

Post-trained LLMs show strong generalization and robustness.

Abstract

Causal inference is essential for decision-making but remains challenging for non-experts. While large language models (LLMs) show promise in this domain, their precise causal estimation capabilities are still limited, and the impact of post-training on these abilities is insufficiently explored. This paper examines the extent to which post-training can enhance LLMs' capacity for causal inference. We introduce CauGym, a comprehensive dataset comprising seven core causal tasks for training and five diverse test sets. Using this dataset, we systematically evaluate five post-training approaches: SFT, DPO, KTO, PPO, and GRPO. Across five in-domain and four existing benchmarks, our experiments demonstrate that appropriate post-training enables smaller LLMs to perform causal inference competitively, often surpassing much larger models. Our 14B parameter model achieves 93.5% accuracy on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
OpenCausaLab/CauGym
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference · Explainable Artificial Intelligence (XAI) · Topic Modeling