SEPS: A Separability Measure for Robust Unlearning in LLMs
Wonje Jeung, Sangyeon Yoon, Albert No

TL;DR
This paper introduces SEPS, a new evaluation framework for unlearning in LLMs that assesses the model's ability to forget and retain information simultaneously within mixed prompts, addressing limitations of existing metrics.
Contribution
The paper proposes SEPS and Mixed Prompt unlearning, a novel training strategy that enhances robustness of unlearning in LLMs in complex, mixed-query scenarios.
Findings
SEPS effectively measures mixed-query unlearning performance.
Mixed Prompt unlearning improves robustness in multi-query settings.
Existing methods often overfit or erase too much information.
Abstract
Machine unlearning aims to selectively remove targeted knowledge from Large Language Models (LLMs), ensuring they forget specified content while retaining essential information. Existing unlearning metrics assess whether a model correctly answers retain queries and rejects forget queries, but they fail to capture real-world scenarios where forget queries rarely appear in isolation. In fact, forget and retain queries often coexist within the same prompt, making mixed-query evaluation crucial. We introduce SEPS, an evaluation framework that explicitly measures a model's ability to both forget and retain information within a single prompt. Through extensive experiments across three benchmarks, we identify two key failure modes in existing unlearning methods: (1) untargeted unlearning indiscriminately erases both forget and retain content once a forget query appears, and (2) targeted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsDiverse Research and Applications · Higher Education Learning Practices
