Large Scale Knowledge Washing

Yu Wang; Ruihan Wu; Zexue He; Xiusi Chen; Julian McAuley

arXiv:2405.16720·cs.CL·February 18, 2025

Large Scale Knowledge Washing

Yu Wang, Ruihan Wu, Zexue He, Xiusi Chen, Julian McAuley

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces LAW, a method for large-scale unlearning of specific knowledge in language models by updating MLP layers, effectively forgetting targeted information while preserving reasoning capabilities.

Contribution

The paper proposes LAW, a novel approach to unlearning knowledge in large language models by updating MLP layers, avoiding the drawbacks of traditional reverse loss methods.

Findings

01

LAW effectively forgets targeted knowledge.

02

Maintains reasoning ability after unlearning.

03

Outperforms existing methods in knowledge unlearning tasks.

Abstract

Large language models show impressive abilities in memorizing world knowledge, which leads to concerns regarding memorization of private information, toxic or sensitive knowledge, and copyrighted content. We introduce the problem of Large Scale Knowledge Washing, focusing on unlearning an extensive amount of factual knowledge. Previous unlearning methods usually define the reverse loss and update the model via backpropagation, which may affect the model's fluency and reasoning ability or even destroy the model due to extensive training with the reverse loss. Existing works introduce additional data from downstream tasks to prevent the model from losing capabilities, which requires downstream task awareness. Controlling the tradeoff of unlearning and maintaining existing capabilities is also challenging. To this end, we propose LAW (Large Scale Washing) to update the MLP layers in…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

Novelty: The paper proposes a novel objective function specifically designed to remove knowledge represented in triplet format, achieving an approach distinct from existing methods. Creation of a large-scale dataset: The development of Wiki-Latest, a new large-scale dataset derived from Wikipedia triplets, is a valuable contribution to the field. Comprehensive evaluation: The paper presents a detailed comparative analysis with multiple existing methods.

Weaknesses

Knowledge vs. Reasoning: While the paper aims to disentangle knowledge and reasoning, it doesn't explicitly define what constitutes "reasoning," while the knowledge is defined as triples. Are they referring to specific modules or functionalities within the model? Or are they more abstract concepts related to the model's behavior? Insufficient discussion on disentanglement: While the paper claims in section 5.2 that "In this paper, we show the possibility of the disentanglement between knowledg

Reviewer 02Rating 6Confidence 4

Strengths

1. The paper is well-written with clear logic. 2. The experiments consider both small- and large-scale knowledge washing settings and include baselines for both model editing and machine unlearning methods. 3. The ablation studies are thorough.

Weaknesses

1. Model editing methods often face a problem of generalization where, after editing for a specific query, the model's response reverts to the pre-edit state when the query is rephrased. This raises the question of whether LAW truly makes the model forget sensitive knowledge or just forgets the specific case. It is necessary to use jailbreak prompts to verify true washing; 2. The abstract mentions that machine unlearning affects the fluency and reasoning ability of the model's generation. While

Reviewer 03Rating 6Confidence 4

Strengths

- The proposed unlearning method borrows the idea from the existing knowledge editing method to some degree, but the proposed method itself is original and interesting from the viewpoint of problem setting and derivation (4.Problem setup and 5. Methodology). Especially, problem reformulation by equation (8) is inspiring. - The study conducts wide range of experiments with several benchmarks and baselines, demonstrating the effectiveness of the proposed method.

Weaknesses

- Some existing unlearning methods are not considered as baselines. e.g. https://arxiv.org/abs/2309.11852 - Some important and basic information is not sufficiently explained e.g. how to calculate K and K_w in practical setting in the experiments. It is also better to explain how to derive K and V in equation (2) with more details. - The authors conducted experiments with GPT2 and GPT-J, without clarifying the effectiveness with the current state-of-the-art open models like Llama3 or Gemma.

Code & Models

Repositories

wangyu-ustc/largescalewashing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies