An Information Theoretic Evaluation Metric For Strong Unlearning
Dongjae Jeon, Wonje Jeung, Taeheon Kim, Albert No, Jonghyun Choi

TL;DR
This paper introduces the Information Difference Index (IDI), a new information-theoretic metric that evaluates how well models forget specific data by measuring mutual information in intermediate features, improving assessment of strong unlearning in deep neural networks.
Contribution
The paper proposes the IDI metric, a white-box approach based on mutual information, to better evaluate the effectiveness of strong unlearning in deep neural networks compared to existing methods.
Findings
IDI effectively measures unlearning across datasets and architectures.
IDI provides a more comprehensive assessment than black-box metrics.
Experiments validate IDI's reliability in evaluating unlearning.
Abstract
Machine unlearning (MU) aims to remove the influence of specific data from trained models, addressing privacy concerns and ensuring compliance with regulations such as the ``right to be forgotten.'' Evaluating strong unlearning, where the unlearned model is indistinguishable from one retrained without the forgetting data, remains a significant challenge in deep neural networks (DNNs). Common black-box metrics, such as variants of membership inference attacks and accuracy comparisons, primarily assess model outputs but often fail to capture residual information in intermediate layers. To bridge this gap, we introduce the Information Difference Index (IDI), a novel white-box metric inspired by information theory. IDI quantifies retained information in intermediate features by measuring mutual information between those features and the labels to be forgotten, offering a more comprehensive…
Peer Reviews
Decision·Submitted to ICLR 2025
S1. The assertion that "black-box metrics may not be sufficient for assessing strong unlearning" is well-motivated and compelling. S2. The paper includes extensive empirical evaluations that support the claims made by the authors. S3. The technical quality of the paper is good, and the ideas presented may be of significant interest to the machine learning community.
W1. This study seems to focus predominantly on the single-class forgetting scenario in the main text, relegating results related to other cases (e.g., random data forgetting, multi-class forgetting) to the appendices. Given that the paper aims to provide a general approach and is not limited to single-class forgetting, the authors are strongly encouraged to include essential findings from all three tasks in the main text. For instance, empirical evaluations demonstrating whether the proposed met
Common unlearning metrics are known to be incomplete and not always reliable. This paper proposes a new evaluation methods for unlearning by estimation the mutual information using InfoNCE between intermediate layer features and output labels. This is a method that takes the internal behavior of the model into account, which measures unlearning at a deep level. The paper also proposed a new unlearning methods from a similar motive. The CoLA method is able to achieve stronger unlearning compared
One motivation for the new evaluation metric is from the concern that current metric only measures shallow unlearning which can be prone to relearning attacks, However, the paper does not present the strength of this new metric in a systematic manner. Figure 4 briefly mentioned retrain attacks but does not have the SCRUB method, it also comes before the new metric which makes it a bit confusing to read. The evaluation method uses up to $\ell$-th layer and trains the rest to estimate the InfoNCE
1. The topic of designing evaluation metrics for machine unlearning is both significant and challenging. This paper introduces a new metric based on the mutual information between the features and labels to be forgotten, effectively capturing the residual information in the unlearned model. 2. A new unlearning method inspired by the proposed evaluation metric is introduced, which is greatly appreciated. 3. The presentation is clear and easy to follow. 4. Several experiments are conducted to d
1. If I understand correctly, for each unlearned model with $L$ layers, mutual information must be estimated $3L$ times, each requiring data sampling and the training of two networks. Coupled with the time needed for retraining, the overall complexity is substantial. 2. Given the complexity, the necessity of such calculations needs further scrutiny. A simpler white-box baseline, using an MLP to train a mapping from concatenated hidden features to forget_labels and obtaining a similar metric bas
Videos
Taxonomy
TopicsAdvanced Statistical Methods and Models
