Mitigating Spurious Correlations in LLMs via Causality-Aware Post-Training
Shurui Gui, Shuiwang Ji

TL;DR
This paper introduces causality-aware post-training (CAPT) to reduce spurious correlations in large language models, improving their out-of-distribution generalization without extensive fine-tuning.
Contribution
It proposes a novel causality-aware post-training method that decomposes biased predictions into unbiased steps, enhancing model robustness and generalization.
Findings
CAPT improves OOD performance of 3B-scale LLMs.
CAPT outperforms traditional fine-tuning with only 100 samples.
CAPT enhances generalization without additional fine-tuning biases.
Abstract
While large language models (LLMs) have demonstrated remarkable capabilities in language modeling, recent studies reveal that they often fail on out-of-distribution (OOD) samples due to spurious correlations acquired during pre-training. Here, we aim to mitigate such spurious correlations through causality-aware post-training (CAPT). By decomposing a biased prediction into two unbiased steps, known as \textit{event estimation} and \textit{event intervention}, we reduce LLMs' pre-training biases without incurring additional fine-tuning biases, thus enhancing the model's generalization ability. Experiments on the formal causal inference benchmark CLadder and the logical reasoning dataset PrOntoQA show that 3B-scale language models fine-tuned with CAPT can outperform both traditional SFT and larger LLMs on in-distribution (ID) and OOD tasks using only 100 ID fine-tuning samples,…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The paper proposes an interesting approach based on causality for mitigating spurious correlations. While the general theory is not novel, its proposed instantiation is an original solution to a crucial problem in LLMs and machine learning in general. 2. The theoretical elements of the paper are intuitive and well justified at a high-level (although some clarifications could be made) and the experiments are extensive and support the claims made. 3. The paper is well explained and easy to read
1. While the approach is intuitive, some of its theoretical justification is unclear. The authors perform an intervention on a confounding factor $E$ between Equations (4) and (5) but do not represent it as a do-operation, making the connection between the two equations unclear and lacking justification at a theoretical level. Indeed, the true causal graph cannot directly be observed, especially $S$ or $E$. The claim that replacing event variables with abstract values amounts to intervening on $
1. Incorporating causality into LLMs is an interesting direction. 2. The experimental setup is described in detail, which should facilitate replication of the results. 3. While there are some theoretical and technical flaws with Figures 1 and 2, their format is clear and helps readers understand the motivation behind the proposed method effectively.
1. The proposed SCM raises significant concerns regarding both its correctness and normalization. (1) The definitions of $X$, $Y$, and $E$ are somewhat reasonable, with $X$ representing the prompt, $Y$ representing the answer, and $E$ representing events (which should be a multidimensional variable or a list of events, but it is not clearly depicted in the figure). However, the definition of $S$ is incorrect. The authors use $S$ to represent the underlying logic between multiple events ($E$) and
1. The paper offers a clear and elegant causal perspective on how spurious correlations arise during large-scale pretraining. By introducing an SCM that separates event variables and causal structures, it reframes the debiasing problem from a purely statistical to a causal reasoning viewpoint. 2. Experimental results show that CAPT significantly improves OOD performance on causal and logical reasoning benchmarks (e.g., CLadder and PrOntoQA), even with very limited fine-tuning data.
1. CAPT relies heavily on the accuracy of the event estimation step, but the paper does not evaluate or analyze this component. If the event extraction model produces incorrect or inconsistent results, these errors will propagate through the entire causal pipeline, undermining the validity of the subsequent intervention and reasoning stages. I think an error analysis or ablation study on event estimation quality would be necessary to ensure the framework’s robustness. 2. While the paper provides
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
