Alternate Preference Optimization for Unlearning Factual Knowledge in   Large Language Models

Anmol Mekala; Vineeth Dorna; Shreya Dubey; Abhishek Lalwani; David; Koleczek; Mukund Rungta; Sadid Hasan; Elita Lobo

arXiv:2409.13474·cs.CL·January 23, 2025

Alternate Preference Optimization for Unlearning Factual Knowledge in Large Language Models

Anmol Mekala, Vineeth Dorna, Shreya Dubey, Abhishek Lalwani, David, Koleczek, Mukund Rungta, Sadid Hasan, Elita Lobo

PDF

Open Access 1 Repo

TL;DR

This paper introduces AltPO, a new method for unlearning specific data in large language models by combining negative and positive feedback, improving effectiveness and response quality.

Contribution

It proposes a novel unlearning approach that integrates positive feedback with negative feedback, addressing limitations of existing methods in LLM unlearning.

Findings

01

Effective unlearning of specific data sets

02

Maintains overall model performance

03

Avoids nonsensical responses during unlearning

Abstract

Machine unlearning aims to efficiently eliminate the influence of specific training data, known as the forget set, from the model. However, existing unlearning methods for Large Language Models (LLMs) face a critical challenge: they rely solely on negative feedback to suppress responses related to the forget set, which often results in nonsensical or inconsistent outputs, diminishing model utility and posing potential privacy risks. To address this limitation, we propose a novel approach called Alternate Preference Optimization (AltPO), which combines negative feedback with in-domain positive feedback on the forget set. Additionally, we introduce new evaluation metrics to assess the quality of responses related to the forget set. Extensive experiments show that our approach not only enables effective unlearning but also avoids undesirable model behaviors while maintaining overall model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

molereddy/Alternate-Preference-Optimization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management