MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Weijia Shi; Jaechan Lee; Yangsibo Huang; Sadhika Malladi; Jieyu Zhao,; Ari Holtzman; Daogao Liu; Luke Zettlemoyer; Noah A. Smith; Chiyuan Zhang

arXiv:2407.06460·cs.CL·July 16, 2024·3 cites

MUSE: Machine Unlearning Six-Way Evaluation for Language Models

Weijia Shi, Jaechan Lee, Yangsibo Huang, Sadhika Malladi, Jieyu Zhao,, Ari Holtzman, Daogao Liu, Luke Zettlemoyer, Noah A. Smith, Chiyuan Zhang

PDF

Open Access 3 Repos 2 Datasets 3 Reviews

TL;DR

This paper introduces MUSE, a comprehensive benchmark for evaluating machine unlearning algorithms on language models, highlighting their strengths and weaknesses across six key properties to improve privacy, utility, scalability, and sustainability.

Contribution

We propose MUSE, a new benchmark with six evaluation criteria for unlearning algorithms, and benchmark eight algorithms on large language models to assess their effectiveness and limitations.

Findings

01

Most algorithms prevent verbatim and knowledge memorization to some extent.

02

Only one algorithm effectively prevents privacy leakage.

03

Existing algorithms often degrade model utility and lack sustainability for sequential unlearning.

Abstract

Language models (LMs) are trained on vast amounts of text data, which may include private and copyrighted content. Data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning only these datapoints (i.e., retraining with the data removed) is intractable in modern-day models. This has led to the development of many approximate unlearning algorithms. The evaluation of the efficacy of these algorithms has traditionally been narrow in scope, failing to precisely quantify the success and practicality of the algorithm from the perspectives of both the model deployers and the data owners. We address this issue by proposing MUSE, a comprehensive machine unlearning evaluation benchmark that enumerates six diverse desirable properties for unlearned models: (1) no verbatim memorization, (2) no knowledge memorization, (3)…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 4

Strengths

This paper tackles very important problems with solid efforts to build the benchmark. The six perspectives of the benchmark are impactful and clear. I enjoy reading this work and I am convinced by the experiments, which are sufficiently comprehensive and well-designed.

Weaknesses

Overall the weaknesses are not significant. I think the scale of data and number of methods may be further extended, for example, the conclusion of a method's effectiveness may change when forget set gets larger. Also the hyperparameter tuning may be sub-optimal and require more elaboration. Figure 4 multiple lines are using the same color which is confusing. The blue curve in Figure 6 seems completely covered. Minor: Line 414 GA should be GA_GDR?

Reviewer 02Rating 6Confidence 3

Strengths

1: This paper provides a comprehensive and detailed study of methods for machine unlearning, conducting an in-depth evaluation from six perspectives. It offers a thorough assessment framework covering aspects such as semantics, continuity, knowledge, memory, and privacy. Compared to previous evaluation frameworks, this approach has a broader scope, assesses from more perspectives, and utilizes a larger dataset, demonstrating the framework's comprehensiveness and effectiveness. 2: This paper pro

Weaknesses

1: Although the authors provide numerous metrics, many of them heavily rely on previous methods. For example, the C3 metric for privacy assessment has already been addressed in a series of earlier approaches. Additionally, it remains unclear whether certain metrics, such as C5 and C6, are truly essential for evaluating machine unlearning, as their explanations in the paper are not entirely clear. While I appreciate the advantages noted in Strengths, I would like to see more about how these vari

Reviewer 03Rating 8Confidence 4

Strengths

- What the authors propose is very helpful for the community. Plenty of work is focused on developing approximate unlearning methods for LLMs, and evaluation methods employed are all too often ad-hoc rather than comprehensive and rigorous. They setup authors propose is well-thought through and covers all meaningful dimensions (at least that I see). - Thorough analysis of unlearning algorithms, for 2 datasets - I particularly like the realistic setup for utility preservation of Harry Potter.

Weaknesses

(more see questions) - Only evaluating one LLM, in one finetuning regime. - No justification for the high MIA performances, which is needed to evaluate how realistic the setup and its conclusions are. - Limited utility evaluation. - Minor clarifications needed

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling