UnUnlearning: Unlearning is not sufficient for content regulation in   advanced generative AI

Ilia Shumailov; Jamie Hayes; Eleni Triantafillou; Guillermo; Ortiz-Jimenez; Nicolas Papernot; Matthew Jagielski; Itay Yona; Heidi Howard,; Eugene Bagdasaryan

arXiv:2407.00106·cs.LG·July 2, 2024·1 cites

UnUnlearning: Unlearning is not sufficient for content regulation in advanced generative AI

Ilia Shumailov, Jamie Hayes, Eleni Triantafillou, Guillermo, Ortiz-Jimenez, Nicolas Papernot, Matthew Jagielski, Itay Yona, Heidi Howard,, Eugene Bagdasaryan

PDF

Open Access

TL;DR

This paper argues that unlearning in large language models is insufficient for content regulation because unlearned knowledge can be reintroduced during inference, necessitating additional filtering methods.

Contribution

It introduces the concept of ununlearning, showing that knowledge can be reintroduced in-context, challenging the effectiveness of unlearning for content regulation in LLMs.

Findings

01

Unlearning does not prevent models from performing impermissible acts during inference.

02

Ununlearning can reintroduce forgotten knowledge in-context.

03

Content filtering remains necessary despite unlearning efforts.

Abstract

Exact unlearning was first introduced as a privacy mechanism that allowed a user to retract their data from machine learning models on request. Shortly after, inexact schemes were proposed to mitigate the impractical costs associated with exact unlearning. More recently unlearning is often discussed as an approach for removal of impermissible knowledge i.e. knowledge that the model should not possess such as unlicensed copyrighted, inaccurate, or malicious information. The promise is that if the model does not have a certain malicious capability, then it cannot be used for the associated malicious purpose. In this paper we revisit the paradigm in which unlearning is used for in Large Language Models (LLMs) and highlight an underlying inconsistency arising from in-context learning. Unlearning can be an effective control mechanism for the training phase, yet it does not prevent the model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Topic Modeling · Ethics and Social Impacts of AI