Less is More: Geometric Unlearning for LLMs with Minimal Data Disclosure
Chenchen Tan, Xinghao Li, Shujie Cui, Youyang Qu, Cunjian Chen, and Longxiang Gao

TL;DR
This paper introduces Geometric Unlearning, a novel method for selectively removing specific information from large language models without needing access to original training data, by operating on internal model states.
Contribution
It presents a new approach that uses geometric representations and synthetic prompts to achieve effective unlearning with minimal data and without original training corpus access.
Findings
Strong target suppression on privacy benchmarks
Minimal impact on non-target model performance
Effective unlearning with small synthetic datasets
Abstract
As large language models (LLMs) are increasingly deployed in real-world systems, they must support post-hoc removal of specific content to meet privacy and governance requirements. This motivates selective unlearning, which suppresses information about a particular entity or topic while preserving the LLM's general utility. However, most existing LLM unlearning methods require access to the original training corpus and rely on output-level refusal tuning or broad gradient updates, creating a tension among unlearning strength, non-target preservation, and data availability. We propose Geometric Unlearning (GU), an approach that operates directly on the model's prompt-time planning states without access to the original training corpus. GU distills a compact, low-rank geometry of desired safe behavior from a small set of safe reference prompts, and uses lightweight anchor-in-context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
