Erasing Conceptual Knowledge from Language Models

Rohit Gandikota; Sheridan Feucht; Samuel Marks; David Bau

arXiv:2410.02760·cs.CL·July 23, 2025

Erasing Conceptual Knowledge from Language Models

Rohit Gandikota, Sheridan Feucht, Samuel Marks, David Bau

PDF

Open Access 1 Repo 4 Models

TL;DR

This paper presents ELM, a novel method for concept-level unlearning in language models that reduces undesired concept generation while preserving overall performance and robustness.

Contribution

Introduces ELM, a principled approach leveraging the model's own introspective capabilities for targeted concept unlearning via low-rank updates.

Findings

01

ELM effectively erases targeted concepts from language models.

02

Models with ELM show near-random performance on erased concepts.

03

ELM preserves model performance on unrelated tasks and enhances robustness.

Abstract

In this work, we introduce Erasure of Language Memory (ELM), a principled approach to concept-level unlearning that operates by matching distributions defined by the model's own introspective classification capabilities. Our key insight is that effective unlearning should leverage the model's ability to evaluate its own knowledge, using the language model itself as a classifier to identify and reduce the likelihood of generating content related to undesired concepts. ELM applies this framework to create targeted low-rank updates that reduce generation probabilities for concept-specific content while preserving the model's broader capabilities. We demonstrate ELM's efficacy on biosecurity, cybersecurity, and literary domain erasure tasks. Comparative evaluation reveals that ELM-modified models achieve near-random performance on assessments targeting erased concepts, while simultaneously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rohitgandikota/erasing-llm
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Topic Modeling