Geometric-disentangelment Unlearning
Duo Zhou, Yuji Zhang, Tianxin Wei, Ruizhong Qiu, Ke Yang, Xiao Lin, Cheng Qian, Jingrui He, Hanghang Tong, Chengxiang Zhai, Heng Ji, Huan Zhang

TL;DR
This paper introduces Geometric-disentanglement Unlearning (GU), a theoretically grounded method that improves the balance between forgetting private or harmful content and retaining knowledge in large language models, with provable guarantees.
Contribution
It formalizes the 'no side effects' principle in LLM unlearning, and proposes a projection-based method that reduces forget-retain trade-offs with theoretical backing.
Findings
GU enhances forgetting effectiveness by up to 62% in Extraction Strength.
GU reduces collateral knowledge loss, improving retain ES by 31%.
Experiments on multiple datasets demonstrate GU's effectiveness and theoretical soundness.
Abstract
Large language models (LLMs) can internalize private or harmful content, motivating unlearning that removes a forget set while preserving retaining knowledge. However, forgetting updates often cause collateral degradation on retaining knowledge, creating a persistent trade-off. Existing LLM unlearning methods are often heuristic, and other theoretical approaches rely on offline feature constructions that do not capture update-time forget-retain interaction in LLMs. To address this limitation, we aim to develop an LLM unlearning method that reduces the forget-retain trade-off with theoretical guarantees. We take a first-principles view by formalizing "no side effects" as local retain invariance under small parameter updates, and prove an equivalence under optimizer-induced geometry: the retain loss is locally invariant if and only if the update direction is orthogonal to the subspace…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Advanced Graph Neural Networks · Adversarial Robustness in Machine Learning
