Soft Prompting for Unlearning in Large Language Models
Karuna Bhaila, Minh-Hao Van, Xintao Wu

TL;DR
This paper introduces SPUL, a lightweight soft prompting method enabling large language models to unlearn specific data subsets at inference time, balancing utility and forgetting without updating model weights.
Contribution
The paper proposes a novel soft prompting framework, SPUL, for efficient unlearning in LLMs, offering a scalable and parameter-free alternative to fine-tuning methods.
Findings
SPUL effectively unlearns specific data with minimal utility loss.
The method scales across multiple LLM architectures.
Hyperparameter choices influence unlearning effectiveness.
Abstract
The widespread popularity of Large Language Models (LLMs), partly due to their unique ability to perform in-context learning, has also brought to light the importance of ethical and safety considerations when deploying these pre-trained models. In this work, we focus on investigating machine unlearning for LLMs motivated by data protection regulations. In contrast to the growing literature on fine-tuning methods to achieve unlearning, we focus on a comparatively lightweight alternative called soft prompting to realize the unlearning of a subset of training data. With losses designed to enforce forgetting as well as utility preservation, our framework \textbf{S}oft \textbf{P}rompting for \textbf{U}n\textbf{l}earning (SPUL) learns prompt tokens that can be appended to an arbitrary query to induce unlearning of specific examples at inference time without updating LLM parameters. We conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems
MethodsFocus
