From Anchors to Supervision: Memory-Graph Guided Corpus-Free Unlearning for Large Language Models
Wenxuan Li, Zhenfei Zhang, Mi Zhang, Geng Hong, Mi Wen, Xiaoyu You, Min Yang

TL;DR
The paper introduces MAGE, a novel framework for unlearning in large language models that uses minimal user anchors and memory graphs to efficiently erase specific memorized content without access to original training data.
Contribution
MAGE is a model-agnostic, corpus-free unlearning method that leverages memory graphs and minimal anchors to enable effective and auditable unlearning in large language models.
Findings
MAGE achieves unlearning performance comparable to reference-based supervision.
MAGE preserves overall utility of the language models after unlearning.
MAGE enables a practical, minimal-anchor-driven unlearning workflow.
Abstract
Large language models (LLMs) may memorize sensitive or copyrighted content, raising significant privacy and legal concerns. While machine unlearning has emerged as a potential remedy, prevailing paradigms rely on user-provided forget sets, making unlearning requests difficult to audit and exposing systems to secondary leakage and malicious abuse. We propose MAGE, a Memory-grAph Guided Erasure framework for user-minimized, corpus-free unlearning. Given only a lightweight user anchor that identifies a target entity, MAGE probes the target LLM to recover target-related memorization, organizes it into a weighted local memory graph, and synthesizes scoped supervision for unlearning. MAGE is model-agnostic, can be plugged into standard unlearning methods, and requires no access to the original training corpus. Experiments on two benchmarks, TOFU and RWKU, demonstrate that MAGE's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
