Chunky Post-Training: Data Driven Failures of Generalization
Seoirse Murray, Allison Qi, Timothy Qian, John Schulman, Collin Burns, Sara Price

TL;DR
This paper investigates how diverse datasets used in large language model post-training can introduce unintended behaviors due to spurious correlations, and presents tools to identify and trace these failures back to specific data chunks.
Contribution
The paper introduces SURF and TURF, novel tools for detecting and tracing post-training data failures in large language models, revealing the impact of data chunks on model behavior.
Findings
Chunky post-training causes miscalibrated behaviors in models.
Spurious correlations in data chunks lead to unexpected model failures.
Tools effectively trace failures to specific post-training data segments.
Abstract
LLM post-training involves many diverse datasets, each targeting a specific behavior. But these datasets encode incidental patterns alongside intended ones: correlations between formatting and content, narrow phrasings across diverse problems, and implicit associations arising from the discrete data curation process. These patterns are often invisible to developers yet salient to models, producing behaviors that surprise their creators, such as rejecting true facts presented in a particular question format. We call this chunky post-training: the model learns spurious correlations as a result of distinct chunks of post-training data. We introduce SURF, a black-box pipeline which surfaces these unintended behaviors at run time, and TURF, a tool that traces these failures back to specific post-training data. Applying these tools to frontier models (Claude 4.5, GPT-5.1, Grok 4.1, Gemini 3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms
