Replaying pre-training data improves fine-tuning

Suhas Kotha; Percy Liang

arXiv:2603.04964·cs.CL·March 6, 2026

Replaying pre-training data improves fine-tuning

Suhas Kotha, Percy Liang

PDF

Open Access

TL;DR

Replaying generic pre-training data during fine-tuning enhances model performance and data efficiency, especially when target data is limited, as demonstrated in controlled experiments and real-world tasks.

Contribution

This paper reveals that replaying generic data during fine-tuning improves target task performance, a novel insight that challenges traditional practices.

Findings

01

Replay increases target data efficiency by up to 1.87x during fine-tuning.

02

Replay improves performance in real tasks, such as web navigation and question-answering.

03

Less target data in pre-training makes replay more beneficial.

Abstract

To obtain a language model for a target domain (e.g. math), the current paradigm is to pre-train on a vast amount of generic web text and then fine-tune on the relatively limited amount of target data. Typically, generic data is only mixed in during fine-tuning to prevent catastrophic forgetting of the generic domain. We surprisingly find that replaying the generic data during fine-tuning can actually improve performance on the (less related) target task. Concretely, in a controlled pre-training environment with 4M target tokens, 4B total tokens, and 150M parameter models, generic replay increases target data efficiency by up to $1.87 \times$ for fine-tuning and $2.06 \times$ for mid-training. We further analyze data schedules that introduce target data during pre-training and find that replay helps more when there is less target data present in pre-training. We demonstrate the success of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications