MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

Tianle Gu; Kexin Huang; Ruilin Luo; Yuanqi Yao; Yujiu Yang; Yan Teng,; Yingchun Wang

arXiv:2409.11844·cs.CL·September 19, 2024·2 cites

MEOW: MEMOry Supervised LLM Unlearning Via Inverted Facts

Tianle Gu, Kexin Huang, Ruilin Luo, Yuanqi Yao, Yujiu Yang, Yan Teng,, Yingchun Wang

PDF

Open Access 1 Repo

TL;DR

MEOW introduces a gradient descent-based method for unlearning sensitive information in LLMs by generating inverted facts and selecting the most effective ones, achieving better forgetfulness with minimal utility loss.

Contribution

The paper proposes a novel unlearning approach using inverted facts and a new metric, MEMO, to improve forgetting efficiency and robustness in LLMs.

Findings

01

MEOW significantly improves forget quality on the ToFU benchmark.

02

MEOW maintains model utility with minimal degradation.

03

Slight NLU performance improvements observed with MEOW.

Abstract

Large Language Models (LLMs) can memorize sensitive information, raising concerns about potential misuse. LLM Unlearning, a post-hoc approach to remove this information from trained LLMs, offers a promising solution to mitigate these risks. However, previous practices face three key challenges: 1. Utility: successful unlearning often causes catastrophic collapse on unrelated tasks. 2. Efficiency: many methods either involve adding similarly sized models, which slows down unlearning or inference, or require retain data that are difficult to obtain. 3. Robustness: even effective methods may still leak data via extraction techniques. To address these challenges, we propose MEOW, a simple yet effective gradient descent-based unlearning method. Specifically, we use an offline LLM to generate a set of inverted facts. Then, we design a new metric, MEMO, to quantify memorization in LLMs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

carol-gutianle/meow
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training