The Curse of Popularity: Popular Entities have Catastrophic Side Effects   when Deleting Knowledge from Language Models

Ryosuke Takahashi; Go Kamoda; Benjamin Heinzerling; Keisuke Sakaguchi,; Kentaro Inui

arXiv:2406.06032·cs.CL·June 11, 2024

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models

Ryosuke Takahashi, Go Kamoda, Benjamin Heinzerling, Keisuke Sakaguchi,, Kentaro Inui

PDF

Open Access 1 Video

TL;DR

This paper investigates the risks of deleting knowledge from language models, revealing that removing information about popular entities can cause severe side effects, and introduces analysis using synthetic knowledge graphs for controlled experiments.

Contribution

It is the first to analyze knowledge deletion effects on models trained with synthetic knowledge graphs, highlighting the catastrophic side effects associated with popular entities.

Findings

01

Deleting popular entity knowledge can cause catastrophic side effects

02

Knowledge deletion impacts are more severe for popular entities

03

Synthetic knowledge graphs enable controlled experiments on knowledge removal

Abstract

Language models (LMs) encode world knowledge in their internal parameters through training. However, LMs may learn personal and confidential information from the training data, leading to privacy concerns such as data leakage. Therefore, research on knowledge deletion from LMs is essential. This study focuses on the knowledge stored in LMs and analyzes the relationship between the side effects of knowledge deletion and the entities related to the knowledge. Our findings reveal that deleting knowledge related to popular entities can have catastrophic side effects. Furthermore, this research is the first to analyze knowledge deletion in models trained on synthetic knowledge graphs, indicating a new direction for controlled experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

The Curse of Popularity: Popular Entities have Catastrophic Side Effects when Deleting Knowledge from Language Models· underline

Taxonomy

TopicsComputational and Text Analysis Methods