Unforgettable Generalization in Language Models

Eric Zhang; Leshem Chosen; and Jacob Andreas

arXiv:2409.02228·cs.LG·September 5, 2024

Unforgettable Generalization in Language Models

Eric Zhang, Leshem Chosen, and Jacob Andreas

PDF

Open Access

TL;DR

This paper investigates how language models forget skills after fine-tuning with randomized labels, revealing variability in generalization, factors influencing forgetting, and the shallow nature of the forgetting process across different tasks.

Contribution

It provides a detailed analysis of the unpredictability and limitations of targeted skill removal in language models through fine-tuning with random labels.

Findings

01

Forgetting generalizes robustly in some tasks like entailment classification.

02

In other tasks, models retain performance despite forgetting training examples.

03

Low initial confidence and low representation variability predict better forgetting generalization.

Abstract

When language models (LMs) are trained to forget (or "unlearn'') a skill, how precisely does their behavior change? We study the behavior of transformer LMs in which tasks have been forgotten via fine-tuning on randomized labels. Such LMs learn to generate near-random predictions for individual examples in the "training'' set used for forgetting. Across tasks, however, LMs exhibit extreme variability in whether LM predictions change on examples outside the training set. In some tasks (like entailment classification), forgetting generalizes robustly, and causes models to produce uninformative predictions on new task instances; in other tasks (like physical commonsense reasoning and scientific question answering) forgetting affects only the training examples, and models continue to perform the "forgotten'' task accurately even for examples very similar to those that appeared in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training