Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures

Sangyeon Yoon; Hyesoo Hong; Wonje Jeung; Albert No

arXiv:2602.03379·cs.LG·February 4, 2026

Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures

Sangyeon Yoon, Hyesoo Hong, Wonje Jeung, Albert No

PDF

Open Access 3 Reviews

TL;DR

This paper reveals that syntactic similarity, not topical relevance, causes benign relearning in machine unlearning, and proposes syntactic diversification to improve unlearning effectiveness and stability.

Contribution

It identifies syntactic similarity as the key factor in benign relearning and introduces a syntactic diversification method to enhance unlearning robustness.

Findings

01

Syntactic similarity triggers recovery of forgotten info.

02

Syntactic diversification reduces relearning and improves unlearning speed.

03

The approach alleviates the trade-off between unlearning and model utility.

Abstract

Machine unlearning aims to remove specific content from trained models while preserving overall performance. However, the phenomenon of benign relearning, in which forgotten information reemerges even from benign fine-tuning data, reveals that existing unlearning methods remain fundamentally fragile. A common explanation attributes this effect to topical relevance, but we find this account insufficient. Through systematic analysis, we demonstrate that syntactic similarity, rather than topicality, is the primary driver: across benchmarks, syntactically similar data consistently trigger recovery even without topical overlap, due to their alignment in representations and gradients with the forgotten content. Motivated by this insight, we introduce syntactic diversification, which paraphrases the original forget queries into heterogeneous structures prior to unlearning. This approach…

Peer Reviews

Decision·ICLR 2026 Poster

Reviewer 01Rating 4Confidence 5

Strengths

- The paper is well written. - The paper provides a new explanation of the success of relearning in the context of LLM unlearning, which is important for understanding in this community.

Weaknesses

- One thing that has been significantly underlooked in this paper is **between which two sets** should syntactic similarity be looked at. There are three sets: unlearning set $D_{forget}$, eval set $D_{target}$, relearn set $D_{relearn}$. Section 5 characterizes the syntactic similarity between $D_{target}$ and $D_{relearn}$, but there are other dimensions: syntactic similarity between $D_{forget}$ and $D_{relearn}$ and syntactic similarity between $D_{forget}$ and $D_{target}$. In the TOFU case

Reviewer 02Rating 6Confidence 4

Strengths

1. The problem is clearly and significantly addressed. The authors challenge mainstream understanding in the field (topic relevance-driven) and propose a novel and insightful perspective (syntactic similarity-driven), crucial for understanding the failure of forgetting mechanisms. 2. The experimental design is rigorous and comprehensive, ensuring fair and rigorous evaluation. Evaluation confounding in BLUR is identified and eliminated, the number of steps is standardized, and the optimal result

Weaknesses

1. Limitations of the syntactic similarity metric. The paper uses normalized Levenshtein distance as a measure of syntactic similarity. While this is a simple and effective character-level metric, it may fail to capture more abstract and deeper syntactic structures (such as the structural similarity of parse trees). It is suggested to explore using more sophisticated syntactic analysis tools to measure syntactic similarity, which could reveal more subtle mechanisms. 2. Cost of the proposed solu

Reviewer 03Rating 6Confidence 3

Strengths

1. The paper shows syntactic similarity (in the query), not topicality, is the consistent relearning driver across methods (GA, NPO, SCRUB) and datasets. The paper also identifies evaluation confounds (dataset size -> step budget i.e., non-monotonic training trajectories) that can make topicality look stronger than it is, then re-evaluates with a step-standardized protocol. This corrects the narrative and is a valuable insight for the community. 2. The analysis provided with the Heatmaps, rele

Weaknesses

1. The approach relies on GPT-4o paraphrasing. What is the cost/latency at unlearning time for large forget sets, and does quality vary by domain or language? A scaling/cost analysis and a cheaper in-house paraphrase baseline would help adoption. 2. Keyword-based relearn success rate captures name reappearance but may miss partial leakage or paraphrastic leakage. Similarly, ROUGE-L to base captures surface similarity but not factual equivalence. Including embedding-based and judge-LM evaluation

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Domain Adaptation and Few-Shot Learning