LLM Unlearning using Gradient Ratio-Based Influence Estimation and Noise Injection

Ameya Anjarlekar; Sandeep Pombra

arXiv:2508.06467·cs.LG·August 11, 2025

LLM Unlearning using Gradient Ratio-Based Influence Estimation and Noise Injection

Ameya Anjarlekar, Sandeep Pombra

PDF

Open Access

TL;DR

This paper introduces GRIN, a modular framework for unlearning in large language models that uses gradient ratio metrics to identify and selectively noise-inject parameters responsible for memorizing sensitive data, improving unlearning effectiveness.

Contribution

The paper presents a novel gradient-ratio-based metric for targeted unlearning and combines it with selective noise injection to enhance unlearning in LLMs while preserving utility.

Findings

01

Effective unlearning demonstrated on standard benchmarks

02

Gradient ratio metric accurately identifies parameters for forgetting

03

Noise injection improves unlearning without significant utility loss

Abstract

The growing legal and ethical scrutiny of large language models (LLMs) necessitates effective machine unlearning, particularly for sensitive or unauthorized data. Existing empirical methods often yield incomplete forgetting or unintended degradation of unrelated knowledge due to poor localization. In this work, we propose GRIN: a modular and targeted framework for LLM unlearning. GRIN introduces a novel gradient-ratio-based metric to identify parameters most responsible for memorizing forget data. We then perform selective noise injection into these parameters prior to fine-tuning, which improves unlearning performance while maintaining model utility. Finally, we propose new evaluation metrics tailored to the LLM setting and validate our approach on standard benchmarks such as TOFU, WMDP, and SafePKU.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage and Signal Denoising Methods