Assessing Robustness to Spurious Correlations in Post-Training Language Models

Julia Shuieh; Prasann Singhal; Apaar Shanker; John Heyer; George Pu; Samuel Denton

arXiv:2505.05704·cs.CL·May 12, 2025

Assessing Robustness to Spurious Correlations in Post-Training Language Models

Julia Shuieh, Prasann Singhal, Apaar Shanker, John Heyer, George Pu, Samuel Denton

PDF

Open Access

TL;DR

This paper evaluates how different post-training methods for language models handle spurious correlations, revealing that robustness varies by task type and correlation nature, with no single method being universally best.

Contribution

It systematically compares three post-training algorithms across diverse tasks and spuriousness conditions, providing insights into their robustness and limitations.

Findings

01

Preference-based methods show robustness in mathematical reasoning.

02

Supervised Fine-Tuning performs better on complex, context-rich tasks.

03

Model performance degrades with increased spurious correlations, but varies by method.

Abstract

Supervised and preference-based fine-tuning techniques have become popular for aligning large language models (LLMs) with user intent and correctness criteria. However, real-world training data often exhibits spurious correlations -- arising from biases, dataset artifacts, or other "shortcut" features -- that can compromise a model's performance or generalization. In this paper, we systematically evaluate three post-training algorithms -- Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and KTO (Kahneman-Tversky Optimization) -- across a diverse set of synthetic tasks and spuriousness conditions. Our tasks span mathematical reasoning, constrained instruction-following, and document-grounded question answering. We vary the degree of spurious correlation (10% vs. 90%) and investigate two forms of artifacts: "Feature Ambiguity" and "Distributional Narrowness." Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsShrink and Fine-Tune · Sparse Evolutionary Training