Chained Tuning Leads to Biased Forgetting

Megan Ung; Alicia Sun; Samuel J. Bell; Bhaktipriya Radharapu; Levent; Sagun; Adina Williams

arXiv:2412.16469·cs.CL·December 30, 2024

Chained Tuning Leads to Biased Forgetting

Megan Ung, Alicia Sun, Samuel J. Bell, Bhaktipriya Radharapu, Levent, Sagun, Adina Williams

PDF

Open Access

TL;DR

This paper investigates how the order of fine-tuning large language models affects their safety and bias retention, revealing that certain sequences lead to greater safety information loss and proposing mitigation strategies.

Contribution

It introduces the concept of biased forgetting, systematically evaluates task order effects, and proposes mitigations to reduce safety-related information loss during chained fine-tuning.

Findings

01

Models forget safety information more when fine-tuned in certain orders.

02

Forgetting disproportionately affects safety data about specific groups.

03

Mitigation techniques can help recover safety knowledge after forgetting.

Abstract

Large language models (LLMs) are often fine-tuned for use on downstream tasks, though this can degrade capabilities learned during previous training. This phenomenon, often referred to as catastrophic forgetting, has important potential implications for the safety of deployed models. In this work, we first show that models trained on downstream tasks forget their safety tuning to a greater extent than models trained in the opposite order. Second, we show that forgetting disproportionately impacts safety information about certain groups. To quantify this phenomenon, we define a new metric we term biased forgetting. We conduct a systematic evaluation of the effects of task ordering on forgetting and apply mitigations that can help the model recover from the forgetting observed. We hope our findings can better inform methods for chaining the finetuning of LLMs in continual learning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning