Early Data Exposure Improves Robustness to Subsequent Fine-Tuning
Lawrence Feng, Gaurav R. Ghosal, Jacob Mitchell Springer, Ziqian Zhong, Aditi Raghunathan

TL;DR
This paper explores how early data exposure during upstream training enhances the robustness of language models to retain capabilities after subsequent fine-tuning, emphasizing preventative training strategies.
Contribution
It demonstrates that early exposure during post-training improves capability retention after fine-tuning, supported by empirical and theoretical analysis across multiple models and tasks.
Findings
Early exposure improves the retention of upstream capabilities after fine-tuning.
Immediate post-training performance does not predict long-term retention reliably.
Replay and dropout methods complement early exposure in mitigating forgetting.
Abstract
How can we train models whose post-trained capabilities survive subsequent fine-tuning? Rather than focusing on downstream interventions to mitigate forgetting of upstream capabilities, we study how upstream training choices - that is, the manner in which a capability is acquired - shape how robustly that capability is retained. We investigate this question in a controlled three-stage language-model pipeline: pretraining, post-training to acquire a target capability, and downstream fine-tuning on a new objective. Across 135M and 1B models, two post-training domains, and two downstream fine-tuning tasks, we find that immediate post-training performance does not reliably predict retention after subsequent fine-tuning: training recipes that look equivalent immediately after post-training can retain the target capability very differently after subsequent fine-tuning. In particular, early…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
