Learning from the Undesirable: Robust Adaptation of Language Models without Forgetting
Yunhun Nam, Jaehyung Kim, Jongheon Jeong

TL;DR
This paper introduces Learning-from-the-Undesirable (LfU), a regularization method for fine-tuning language models with limited data, which improves generalization and robustness by aligning internal representations against undesirable updates.
Contribution
LfU is a novel regularization scheme that enhances language model adaptation by promoting resilience to undesirable model updates, preserving capabilities and improving robustness.
Findings
Achieves 16.8% average improvement on math tasks over vanilla SFT.
Reduces output performance variability by 92.1% under prompt variations.
Enhances model robustness and generalization with limited fine-tuning data.
Abstract
Language models (LMs) are often adapted through supervised fine-tuning (SFT) to specialize their capabilities for downstream tasks. However, in typical scenarios where the fine-tuning data is limited, e.g., compared to pre-training, SFT can lead LMs to overfit, causing them to rely on spurious patterns within the target task or to compromise other broadly useful capabilities as a side effect of narrow specialization. In this paper, we propose Learning-from-the-Undesirable (LfU), a simple yet effective regularization scheme for SFT to mitigate overfitting issues when fine-tuning LMs with limited data. Specifically, we aim to regularize the fine-tuning process to favor solutions that are resilient to "undesirable" model updates, e.g., gradient ascent steps that steer the model toward undesirable behaviors. To this end, we propose a novel form of consistency regularization that directly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Generative Adversarial Networks and Image Synthesis
