Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Sungmin Cha; Sungjun Cho; Dasol Hwang; and Moontae Lee

arXiv:2408.06621·cs.LG·April 28, 2025

Towards Robust and Parameter-Efficient Knowledge Unlearning for LLMs

Sungmin Cha, Sungjun Cho, Dasol Hwang, and Moontae Lee

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces LoKU, a novel framework for efficient and robust unlearning in LLMs that effectively removes sensitive data while preserving model performance, addressing stability and computational challenges of prior methods.

Contribution

The paper proposes Low-rank Knowledge Unlearning (LoKU), combining Inverted Hinge Loss and data-adaptive LoRA initialization to improve unlearning efficiency and stability in large language models.

Findings

01

LoKU effectively removes sensitive data from LLMs.

02

Maintains reasoning and generative capabilities post-unlearning.

03

Outperforms existing methods in stability and efficiency.

Abstract

Large Language Models (LLMs) have demonstrated strong reasoning and memorization capabilities via pretraining on massive textual corpora. However, this poses risk of privacy and copyright violations, highlighting the need for efficient machine unlearning methods that remove sensitive data without retraining from scratch. While Gradient Ascent (GA) is commonly used to unlearn by reducing the likelihood of generating unwanted content, it leads to unstable optimization and catastrophic forgetting of retrained knowledge. We find that combining GA with low-rank adaptation results in poor trade-offs between computational cost and generative performance. To address these challenges, we propose Low-rank Knowledge Unlearning (LoKU), a novel framework that enables robust and efficient unlearning for LLMs. First, we introduce Inverted Hinge Loss, which suppresses unwanted tokens while maintaining…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

1. The motivation is clearly explained. 2. Extensive experiments have been conducted to prove the effectiveness of the proposed methods. 3. The theoretical analysis strengthens the rationale of the proposed methods.

Weaknesses

1. One of the most significant contributions of this paper is the proposal of Inverse Hard Loss (IHL), which claims to increase the probability of the second-best token only. However, it is not clear why IHL does not affect the probability of other tokens. Based on the definition of IHL in Lines 233, the probability of all other tokens is impacted. As such, IHL can only address problem 1 (Line 224) but cannot address problems 2 and 3 of GA (Lines 224 ~ 226). 2. In Figures 3 and 5, the unlearning

Reviewer 02Rating 6Confidence 3

Strengths

1. Authors analyze the derivatives of GA and highlight its shortcomings, the motivation is clear and the theoretical foundation strengthens the rationale for the proposed methods. 2. The introduction of IHL addresses the instability issues of GA by focusing gradient updates on a minimal number of viable replacements for the ground-truth token. This results in a more controlled and stable unlearning process. 3. The proposed strategies are effective. The authors evaluate the methods on multiple da

Weaknesses

The intuition and connection between the proposed methods, IHL and Fisher-Initialization of FILA, appear somewhat weak. This makes the paper feel like it is stacking two separate tricks rather than offering a unified and coherent approach. A more systematic linkage between these methods would enhance the overall coherence and impact of the paper.

Reviewer 03Rating 6Confidence 3

Strengths

### Originality - This paper points out the shortcomings of Gradient Ascent (GA) by analyzing its inverse. - This paper proposes two strategies to improve these shortcomings. - This paper demonstrates the effectiveness of their improvements on two datasets. ### Clarity - The structure of this paper is clear, and most of the content is explained clearly. ### Significance - This paper provides insights into knowledge unlearning through the analysis of Gradient Ascent (GA).

Weaknesses

- This paper lacks the state-of-the-art knowledge unlearning baselines (such as [1][2]). Although the main goal of the paper is to address the shortcomings of GA, incorporating the state-of-the-art knowledge unlearning for comparison would make it more convincing. - Some descriptions are not clear enough. For example, lines 221-223 should include more explanation for the reasons. The authors should explain in detail why GA increases the prediction score for all other tokens $v \neq x_t$ in the

Code & Models

Repositories

csm9493/efficient-llm-unlearning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsTofu · Genetic Algorithms · GPT-Neo · Adapter