Gauss-Newton Unlearning for the LLM Era
Lev McKinney, Anvith Thudi, Juhan Bae, Tara Rezaei, Nicolas Papernot, Sheila A. McIlraith, Roger Grosse

TL;DR
This paper introduces K-FADE, a novel unlearning method for large language models that efficiently removes specific data influences while preserving overall model performance, using Gauss-Newton steps with Hessian approximations.
Contribution
It proposes K-FADE, a state-of-the-art unlearning approach combining Gauss-Newton steps with Kronecker-Factored Approximate Curvature for effective data removal in LLMs.
Findings
K-FADE effectively suppresses outputs from the forget set.
It approximates retraining results while minimally affecting retain set outputs.
Unlearning updates can be efficiently reapplied for continued model maintenance.
Abstract
Standard large language model training can create models that produce outputs their trainer deems unacceptable in deployment. The probability of these outputs can be reduced using methods such as LLM unlearning. However, unlearning a set of data (called the forget set) can degrade model performance on other distributions where the trainer wants to retain the model's behavior. To improve this trade-off, we demonstrate that using the forget set to compute only a few uphill Gauss-Newton steps provides a conceptually simple, state-of-the-art unlearning approach for LLMs. While Gauss-Newton steps adapt Newton's method to non-linear models, it is non-trivial to efficiently and accurately compute such steps for LLMs. Hence, our approach crucially relies on parametric Hessian approximations such as Kronecker-Factored Approximate Curvature (K-FAC). We call this combined approach K-FADE (K-FAC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
