Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric

Jiali Cheng; Ziheng Chen; Chirag Agarwal; Hadi Amiri

arXiv:2601.09624·cs.LG·January 15, 2026

Toward Understanding Unlearning Difficulty: A Mechanistic Perspective and Circuit-Guided Difficulty Metric

Jiali Cheng, Ziheng Chen, Chirag Agarwal, Hadi Amiri

PDF

Open Access

TL;DR

This paper introduces a circuit-based metric called CUD to quantify and understand the difficulty of unlearning specific samples in language models, revealing mechanistic insights into why some data are harder to erase.

Contribution

It proposes a novel circuit-guided difficulty metric for unlearning, linking unlearning challenges to model internal mechanisms and pathways.

Findings

01

CUD reliably distinguishes easy and hard-to-unlearn samples.

02

Easy samples are associated with shorter, earlier circuit interactions.

03

Hard samples involve longer, deeper circuit pathways near late-stage computation.

Abstract

Machine unlearning is becoming essential for building trustworthy and compliant language models. Yet unlearning success varies considerably across individual samples: some are reliably erased, while others persist despite the same procedure. We argue that this disparity is not only a data-side phenomenon, but also reflects model-internal mechanisms that encode and protect memorized information. We study this problem from a mechanistic perspective based on model circuits--structured interaction pathways that govern how predictions are formed. We propose Circuit-guided Unlearning Difficulty (CUD), a {\em pre-unlearning} metric that assigns each sample a continuous difficulty score using circuit-level signals. Extensive experiments demonstrate that CUD reliably separates intrinsically easy and hard samples, and remains stable across unlearning methods. We identify key circuit-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Text Readability and Simplification