FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Nakyeong Yang; Minsung Kim; Seunghyun Yoon; Joongbo Shin; Kyomin Jung

arXiv:2502.19207·cs.CL·October 28, 2025

FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge

Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung

PDF

Open Access

TL;DR

This paper introduces FaithUn, a benchmark for evaluating faithful unlearning in language models, and proposes KLUE, a method that selectively updates knowledge neurons to effectively erase interconnected knowledge.

Contribution

The paper defines superficial unlearning, creates FaithUn benchmark, and proposes KLUE, a neuron-specific unlearning method for more faithful knowledge removal in language models.

Findings

01

Existing unlearning methods often fail to erase interconnected knowledge.

02

KLUE effectively updates only relevant neurons for faithful unlearning.

03

Experimental results show KLUE outperforms baseline methods in real-world QA tasks.

Abstract

Various studies have attempted to remove sensitive or private knowledge from a language model to prevent its unauthorized exposure. However, prior studies have overlooked the complex and interconnected nature of knowledge, where related knowledge must be carefully examined. Specifically, they have failed to evaluate whether an unlearning method faithfully erases interconnected knowledge that should be removed, retaining knowledge that appears relevant but exists in a completely different context. To resolve this problem, we first define a new concept called superficial unlearning, which refers to the phenomenon where an unlearning method either fails to erase the interconnected knowledge it should remove or unintentionally erases irrelevant knowledge. Based on the definition, we introduce a new benchmark, FaithUn, to analyze and evaluate the faithfulness of unlearning in real-world…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Advanced Graph Neural Networks