Context-Free Synthetic Data Mitigates Forgetting
Parikshit Bansal, Sujay Sanghavi

TL;DR
This paper introduces a context-free synthetic data generation method to mitigate catastrophic forgetting in language models during fine-tuning, especially when training data is unavailable, by approximating the KL divergence between models.
Contribution
The paper proposes a simple, effective context-free generation technique to estimate and reduce model forgetting without access to original training data.
Findings
Context-free synthetic data reduces forgetting in fine-tuned models.
Augmenting datasets with context-free data preserves zero-shot and reasoning performance.
Contextual synthetic data and partial pretraining data are less effective.
Abstract
Fine-tuning a language model often results in a degradation of its existing performance on other tasks, due to a shift in the model parameters; this phenomenon is often referred to as (catastrophic) forgetting. We are interested in mitigating this, in settings where we only have access to the model weights but no access to its training data/recipe. A natural approach is to penalize the KL divergence between the original model and the new one. Our main realization is that a simple process - which we term context-free generation - allows for an approximate unbiased estimation of this KL divergence. We show that augmenting a fine-tuning dataset with context-free generations mitigates forgetting, in two settings: (a) preserving the zero-shot performance of pretrained-only models, and (b) preserving the reasoning performance of thinking models. We show that contextual synthetic data, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning · Image Retrieval and Classification Techniques
