Auditing Language Model Unlearning via Information Decomposition

Anmol Goel; Alan Ritter; Iryna Gurevych

arXiv:2601.15111·cs.LG·January 22, 2026

Auditing Language Model Unlearning via Information Decomposition

Anmol Goel, Alan Ritter, Iryna Gurevych

PDF

Open Access

TL;DR

This paper introduces an information-theoretic framework using Partial Information Decomposition to audit language model unlearning, revealing residual information about forgotten data and proposing a risk score for privacy protection.

Contribution

It presents a novel, interpretable method to evaluate unlearning effectiveness at the representation level, exposing residual knowledge and guiding privacy-preserving inference.

Findings

01

Residual information persists after unlearning.

02

Redundant shared information correlates with vulnerability.

03

Proposed risk score helps mitigate privacy risks.

Abstract

We expose a critical limitation in current approaches to machine unlearning in language models: despite the apparent success of unlearning algorithms, information about the forgotten data remains linearly decodable from internal representations. To systematically assess this discrepancy, we introduce an interpretable, information-theoretic framework for auditing unlearning using Partial Information Decomposition (PID). By comparing model representations before and after unlearning, we decompose the mutual information with the forgotten data into distinct components, formalizing the notions of unlearned and residual knowledge. Our analysis reveals that redundant information, shared across both models, constitutes residual knowledge that persists post-unlearning and correlates with susceptibility to known adversarial reconstruction attacks. Leveraging these insights, we propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Ethics and Social Impacts of AI