Improved Supervised Fine-Tuning for Large Language Models to Mitigate Catastrophic Forgetting
Fei Ding, Baiqiao Wang

TL;DR
This paper introduces a novel supervised fine-tuning method for large language models that reduces catastrophic forgetting by synthesizing high-quality datasets, improving both general and domain-specific performance without needing original training data.
Contribution
The proposed method reconstructs instruction distributions and synthesizes datasets to mitigate forgetting, offering a cost-effective solution without access to original SFT data.
Findings
Preserves general capabilities of LLMs after fine-tuning.
Enhances task-specific performance with synthetic datasets.
Outperforms baselines using publicly available SFT datasets.
Abstract
Supervised Fine-Tuning (SFT) is a critical step for enhancing the instruction-following capabilities of Large Language Models (LLMs) and adapting them to specialized domains. However, SFT often leads to a degradation of the model's general abilities, a phenomenon known as catastrophic forgetting. This problem is exacerbated when third-party practitioners fine-tune open-source models, as the original SFT data is typically not available. To address this challenge, we propose a novel and cost-effective SFT method that effectively mitigates catastrophic forgetting without requiring access to the original SFT data. Our approach first reconstructs the likely instruction distribution of the base model. It then employs a multi-model generation and filtering pipeline to synthesize a high-quality general-purpose dataset. This synthetic dataset is mixed with new, domain-specific data for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning
