Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research

A. Feder Cooper; Christopher A. Choquette-Choo; Miranda Bogen; Kevin Klyman; Matthew Jagielski; Katja Filippova; Ken Liu; Alexandra Chouldechova; Jamie Hayes; Yangsibo Huang; Eleni Triantafillou; Peter Kairouz; Nicole Elyse Mitchell; Niloofar Mireshghallah; Abigail Z. Jacobs; James Grimmelmann; Vitaly Shmatikov; Christopher De Sa; Ilia Shumailov; Andreas Terzis; Solon Barocas; Jennifer Wortman Vaughan; danah boyd; Yejin Choi; Sanmi Koyejo; Fernando Delgado; Percy Liang; Daniel E. Ho; Pamela Samuelson; Miles Brundage; David Bau; Seth Neel; Hanna Wallach; Amy B. Cyphert; Mark A. Lemley; Nicolas Papernot; Katherine Lee

arXiv:2412.06966·cs.LG·February 26, 2026

Machine Unlearning Doesn't Do What You Think: Lessons for Generative AI Policy and Research

A. Feder Cooper, Christopher A. Choquette-Choo, Miranda Bogen, Kevin Klyman, Matthew Jagielski, Katja Filippova, Ken Liu, Alexandra Chouldechova, Jamie Hayes, Yangsibo Huang, Eleni Triantafillou, Peter Kairouz, Nicole Elyse Mitchell, Niloofar Mireshghallah, Abigail Z. Jacobs

PDF

Open Access

TL;DR

This paper critically examines the concept of machine unlearning in generative AI, highlighting its limitations and mismatches between goals and feasible implementations, and offers a framework for researchers and policymakers.

Contribution

It provides a framework to understand the challenges of machine unlearning and explains why it is not a universal solution for controlling AI model behavior.

Findings

01

Unlearning faces significant technical and substantive challenges.

02

There are fundamental mismatches between unlearning goals and what can be practically achieved.

03

Unlearning is not a comprehensive solution for managing generative AI outputs.

Abstract

"Machine unlearning" is a popular proposed solution for mitigating the existence of content in an AI model that is problematic for legal or moral reasons, including privacy, copyright, safety, and more. For example, unlearning is often invoked as a solution for removing the effects of specific information from a generative-AI model's parameters, e.g., a particular individual's personal data or the inclusion of copyrighted content in the model's training data. Unlearning is also proposed as a way to prevent a model from generating targeted types of information in its outputs, e.g., generations that closely resemble a particular individual's data or reflect the concept of "Spiderman." Both of these goals--the targeted removal of information from a model and the targeted suppression of information from a model's outputs--present various technical and substantive challenges. We provide a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEthics and Social Impacts of AI · Explainable Artificial Intelligence (XAI)

Methodstravel james