Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training

Shurui Gui; Shuiwang Ji

arXiv:2506.09433·cs.LG·June 12, 2025

Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training

Shurui Gui, Shuiwang Ji

PDF

Open Access 3 Reviews

TL;DR

This paper introduces causality-aware post-training (CAPT) to reduce spurious correlations in large language models, improving their out-of-distribution generalization without extensive fine-tuning.

Contribution

It proposes a novel causality-aware post-training method that decomposes biased predictions into unbiased steps, enhancing model robustness and generalization.

Findings

01

CAPT improves OOD performance of 3B-scale LLMs.

02

CAPT outperforms traditional fine-tuning with only 100 samples.

03

CAPT enhances generalization without additional fine-tuning biases.

Abstract

While large language models (LLMs) have demonstrated remarkable capabilities in language modeling, recent studies reveal that they often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training. Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT). By decomposing a biased prediction into two unbiased steps, known as \textit{event estimation} and \textit{event intervention}, we reduce LLMs' pre-training biases without incurring additional fine-tuning biases, thus enhancing the model's generalization ability. Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks using only 100 ID fine-tuning samples,…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 3

Strengths

1. The paper proposes an interesting approach based on causality for mitigating spurious correlations. While the general theory is not novel, its proposed instantiation is an original solution to a crucial problem in LLMs and machine learning in general. 2. The theoretical elements of the paper are intuitive and well justified at a high-level (although some clarifications could be made) and the experiments are extensive and support the claims made. 3. The paper is well explained and easy to read

Weaknesses

1. While the approach is intuitive, some of its theoretical justification is unclear. The authors perform an intervention on a confounding factor $E$ between Equations (4) and (5) but do not represent it as a do-operation, making the connection between the two equations unclear and lacking justification at a theoretical level. Indeed, the true causal graph cannot directly be observed, especially $S$ or $E$. The claim that replacing event variables with abstract values amounts to intervening on $

Reviewer 02Rating 2Confidence 4

Strengths

1. Incorporating causality into LLMs is an interesting direction. 2. The experimental setup is described in detail, which should facilitate replication of the results. 3. While there are some theoretical and technical flaws with Figures 1 and 2, their format is clear and helps readers understand the motivation behind the proposed method effectively.

Weaknesses

1. The proposed SCM raises significant concerns regarding both its correctness and normalization. (1) The definitions of $X$, $Y$, and $E$ are somewhat reasonable, with $X$ representing the prompt, $Y$ representing the answer, and $E$ representing events (which should be a multidimensional variable or a list of events, but it is not clearly depicted in the figure). However, the definition of $S$ is incorrect. The authors use $S$ to represent the underlying logic between multiple events ($E$) and

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper offers a clear and elegant causal perspective on how spurious correlations arise during large-scale pretraining. By introducing an SCM that separates event variables and causal structures, it reframes the debiasing problem from a purely statistical to a causal reasoning viewpoint. 2. Experimental results show that CAPT significantly improves OOD performance on causal and logical reasoning benchmarks (e.g., CLadder and PrOntoQA), even with very limited fine-tuning data.

Weaknesses

1. CAPT relies heavily on the accuracy of the event estimation step, but the paper does not evaluate or analyze this component. If the event extraction model produces incorrect or inconsistent results, these errors will propagate through the entire causal pipeline, undermining the validity of the subsequent intervention and reasoning stages. I think an error analysis or ablation study on event estimation quality would be necessary to ensure the framework’s robustness. 2. While the paper provides

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education