FaithUn: Toward Faithful Forgetting in Language Models by Investigating the Interconnectedness of Knowledge
Nakyeong Yang, Minsung Kim, Seunghyun Yoon, Joongbo Shin, Kyomin Jung

TL;DR
This paper introduces FaithUn, a benchmark for evaluating faithful unlearning in language models, and proposes KLUE, a method that selectively updates knowledge neurons to effectively erase interconnected knowledge.
Contribution
The paper defines superficial unlearning, creates FaithUn benchmark, and proposes KLUE, a neuron-specific unlearning method for more faithful knowledge removal in language models.
Findings
Existing unlearning methods often fail to erase interconnected knowledge.
KLUE effectively updates only relevant neurons for faithful unlearning.
Experimental results show KLUE outperforms baseline methods in real-world QA tasks.
Abstract
Various studies have attempted to remove sensitive or private knowledge from a language model to prevent its unauthorized exposure. However, prior studies have overlooked the complex and interconnected nature of knowledge, where related knowledge must be carefully examined. Specifically, they have failed to evaluate whether an unlearning method faithfully erases interconnected knowledge that should be removed, retaining knowledge that appears relevant but exists in a completely different context. To resolve this problem, we first define a new concept called superficial unlearning, which refers to the phenomenon where an unlearning method either fails to erase the interconnected knowledge it should remove or unintentionally erases irrelevant knowledge. Based on the definition, we introduce a new benchmark, FaithUn, to analyze and evaluate the faithfulness of unlearning in real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Advanced Graph Neural Networks
