How Data Inter-connectivity Shapes LLMs Unlearning: A Structural   Unlearning Perspective

Xinchi Qiu; William F. Shen; Yihong Chen; Meghdad Kurmanji; Nicola; Cancedda; Pontus Stenetorp; Nicholas D. Lane

arXiv:2406.16810·cs.LG·March 12, 2025

How Data Inter-connectivity Shapes LLMs Unlearning: A Structural Unlearning Perspective

Xinchi Qiu, William F. Shen, Yihong Chen, Meghdad Kurmanji, Nicola, Cancedda, Pontus Stenetorp, Nicholas D. Lane

PDF

Open Access 1 Datasets

TL;DR

This paper introduces PISTOL, a method for creating structured datasets that reveal how inter-connectivity in data affects the difficulty of unlearning in large language models, highlighting challenges in balancing performance across domains.

Contribution

The paper presents PISTOL, a novel dataset compilation method that incorporates data inter-connectivity to study its effects on LLM unlearning, addressing limitations of previous independent data assumptions.

Findings

01

Unlearning difficulty increases with data inter-connectivity.

02

Higher knowledge graph density correlates with greater unlearning difficulty.

03

Skewed domain data makes balancing performance across domains more challenging.

Abstract

While unlearning knowledge from large language models (LLMs) is receiving increasing attention, one important aspect remains unexplored. Existing approaches and benchmarks assume data points to-be-forgotten are independent, ignoring their inter-connectivity - a fundamental characteristic of real-world data structures. In this paper, we propose PISTOL, a method for compiling structural datasets. PISTOL leverages the inherently structured nature of contractual relationships, offering several key benefits. First, it enables insights into the impact of structural data on unlearning effectiveness. Second, it provides precise and concise ground truths for clearer evaluation. Third, its attribute generation does not require input from pre-trained LLMs, mitigating confounding risks. Leveraging datasets synthesized using PISTOL, we demonstrate how data inter-connectivity impacts LLM unlearning.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

xinchiqiu/PISTOL
dataset· 93 dl
93 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematics, Computing, and Information Processing · Open Education and E-Learning · Natural Language Processing Techniques