TL;DR
UniErase introduces a novel, precise, and balanced unlearning framework for large language models, effectively removing outdated knowledge while retaining overall model ability, outperforming existing methods across various benchmarks.
Contribution
The paper proposes UniErase, a new editing-based unlearning paradigm using Unlearning Tokens and Edits to achieve precise, balanced unlearning and ability retention in LLMs.
Findings
Outperforms 8 baselines on TOFU benchmark.
Modifies only 3.66% of parameters for effective unlearning.
Achieves 4.01× better model ability retention than previous methods.
Abstract
Large language models (LLMs) require iterative updates to address the outdated information problem, where LLM unlearning offers an approach for selective removal. However, mainstream unlearning methods primarily rely on fine-tuning techniques, which often lack precision in targeted unlearning and struggle to balance unlearning efficacy with general ability under massive and sequential settings. To bridge this gap, in this work, we introduce UniErase, a novel unlearning framework that demonstrates precision and balanced performances between knowledge unlearning and ability retaining. We first propose the Unlearning Token, which is optimized to steer LLMs toward a forgetting space. To achieve concrete unlearning behaviors, we further introduce the lightweight Unlearning Edit to efficiently associate the unlearning targets with this meta-token. Serving as a new unlearning paradigm via…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
**S1**. The paper clearly defines the Unlearning Logical Chain and derives closed-form parameter updates (Eq. 11, 13), bridging intuitive motivation and formalism. **S2**. The evaluations include batch, sequential, and precise unlearning, with both synthetic (TOFU) and real (RETURN) data. **S3**. The proposed method can be very efficient as it updates only a fraction of the parameters.
**W1**. The claim that UniErase “pioneers” the modeling of LLM unlearning as a knowledge editing problem is overstated. For example, see [1]. **W2**. The notation is a bit sloppy, e.g., in Eq. 7, $a’$ was previously used for &D\D_f&, and the frequent use of $a$ and $\alpha$ can be confusing. **W3**. The presentation of the paper can be improved. For example, the methodology is explained through a few steps in Sec. 4.1 – 4.3, but each is very verbose and includes notational details that can b
- Breaks through the limitations of mainstream fine-tuning-based unlearning methods, proposing UniErase—a new unlearning paradigm that models LLM unlearning as a knowledge editing problem. By directly modifying model parameters instead of multi-round fine-tuning, it expands the research scope of the unlearning field and provides a new direction for subsequent studies. - Achieves dual-high performance in unlearning efficacy and general ability retention. With only ~3.66% of LLM parameters modifie
- This study lacks model diversity, as experiments were only conducted on two models from the LLaMA series, failing to provide verification of generalization. - The expression in the paper is not clear enough, and the same issue applies to the presentation of figures. For instance, Figures 1, 2, 3, 4, and 6 almost all have overlaps between text and graphics—even text from subfigures overlapping with other subfigures. This clearly fails to meet the publication requirements for academic papers. -
The paper presents a clear problem formulation and reports improved unlearning performance over benchmarked methods, while maintaining better balance in sequential unlearning scenarios.
The novelty of UniErase is limited, its core model editing component closely mirrors prior work, offering little methodological advancement. The benchmarks used are outdated, with stronger recent unlearning methods omitted, making the claimed advantage unconvincing. Moreover, UniErase underperforms on MMLU, HumanEval, and GSM-8K. The reported evaluation metric is a cocktail of multiple scores, a questionable approach on reporting performance.
This paper tackles the important problem of knowledge unlearning with a clear conceptual division between the meta unlearning token and the editing process. The meta unlearning token serves as a well-defined objective for model editing, and although the closed-form derivation is not new, it provides a transparent analytical means for implementing the edit.
- UniErase borrows heavily from the editing playbook and applying such editing paradigms to unlearning has been previously explored, so the contribution feels more like an engineering consolidation than a conceptual advance. The key difference appears to lie in defining a shared target object, [UNL], a construct that was discussed in previous paper [1]. Yet the paper does not sufficiently clarify why this intermediate meta token is necessary or preferable compared to directly editing toward an “
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTofu
