TOFU: A Task of Fictitious Unlearning for LLMs
Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C. Lipton, J., Zico Kolter

TL;DR
This paper introduces TOFU, a benchmark for evaluating unlearning methods in large language models, highlighting the current limitations of existing approaches in effectively forgetting specific data.
Contribution
It provides a synthetic dataset, a suite of metrics, and baseline results to facilitate research on effective unlearning in large language models.
Findings
Existing unlearning algorithms are ineffective at fully forgetting data.
The TOFU benchmark enables systematic evaluation of unlearning methods.
Baseline results show a need for improved unlearning techniques.
Abstract
Large language models trained on massive corpora of data from the web can memorize and reproduce sensitive or private data raising both legal and ethical concerns. Unlearning, or tuning models to forget information present in their training data, provides us with a way to protect private data after training. Although several methods exist for such unlearning, it is unclear to what extent they result in models equivalent to those where the data to be forgotten was never learned in the first place. To address this challenge, we present TOFU, a Task of Fictitious Unlearning, as a benchmark aimed at helping deepen our understanding of unlearning. We offer a dataset of 200 diverse synthetic author profiles, each consisting of 20 question-answer pairs, and a subset of these profiles called the forget set that serves as the target for unlearning. We compile a suite of metrics that work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Privacy-Preserving Technologies in Data · Interpreting and Communication in Healthcare
MethodsSparse Evolutionary Training
