KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models

Ce Fang; Zhikun Zhang; Min Chen; Qing Liu; Lu Zhou; Zhe Liu; Yunjun Gao

arXiv:2602.19275·cs.CR·February 25, 2026

KUDA: Knowledge Unlearning by Deviating Representation for Large Language Models

Ce Fang, Zhikun Zhang, Min Chen, Qing Liu, Lu Zhou, Zhe Liu, Yunjun Gao

PDF

Open Access

TL;DR

KUDA introduces a novel method for unlearning specific knowledge in large language models by manipulating internal representations, effectively removing targeted information while preserving overall model performance.

Contribution

The paper proposes KUDA, a new approach that precisely unlearns targeted knowledge in LLMs by deviating internal representations and balancing forgetting with retention.

Findings

01

KUDA outperforms existing methods on benchmarks WMDP and MUSE.

02

It effectively removes targeted knowledge while maintaining model utility.

03

The method achieves a better balance between unlearning and knowledge retention.

Abstract

Large language models (LLMs) acquire a large amount of knowledge through pre-training on vast and diverse corpora. While this endows LLMs with strong capabilities in generation and reasoning, it amplifies risks associated with sensitive, copyrighted, or harmful content in training data. LLM unlearning, which aims to remove specific knowledge encoded within models, is a promising technique to reduce these risks. However, existing LLM unlearning methods often force LLMs to generate random or incoherent answers due to their inability to alter the encoded knowledge precisely. To achieve effective unlearning at the knowledge level of LLMs, we propose Knowledge Unlearning by Deviating representAtion (KUDA). We first utilize causal tracing to locate specific layers for target knowledge storage. We then design a new unlearning objective that induces the model's representations to deviate from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Advanced Graph Neural Networks