GENIU: A Restricted Data Access Unlearning for Imbalanced Data

Chenhao Zhang; Shaofei Shen; Yawen Zhao; Weitong Tony Chen; Miao Xu

arXiv:2406.07885·cs.LG·June 13, 2024

GENIU: A Restricted Data Access Unlearning for Imbalanced Data

Chenhao Zhang, Shaofei Shen, Yawen Zhao, Weitong Tony Chen, Miao Xu

PDF

Open Access 3 Reviews

TL;DR

GENIU is a novel framework that enables class unlearning in imbalanced datasets with restricted data access by using a VAE-based proxy generator, improving unlearning accuracy without original data.

Contribution

It introduces GENIU, the first practical method for class unlearning in imbalanced data scenarios with limited data access, utilizing a VAE for proxy generation and in-batch tuning.

Findings

01

GENIU outperforms existing unlearning methods in imbalanced data settings.

02

The VAE-based proxy generator effectively represents class distributions.

03

In-batch tuning enhances unlearning performance for majority classes.

Abstract

With the increasing emphasis on data privacy, the significance of machine unlearning has grown substantially. Class unlearning, which involves enabling a trained model to forget data belonging to a specific class learned before, is important as classification tasks account for the majority of today's machine learning as a service (MLaaS). Retraining the model on the original data, excluding the data to be forgotten (a.k.a forgetting data), is a common approach to class unlearning. However, the availability of original data during the unlearning phase is not always guaranteed, leading to the exploration of class unlearning with restricted data access. While current unlearning methods with restricted data access usually generate proxy sample via the trained neural network classifier, they typically focus on training and forgetting balanced data. However, the imbalanced original data can…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 1· strong rejectConfidence 3

Strengths

The clarity of the work is acceptable. Furthermore, the work is highly novel and original. The usage of VAE seems cool.

Weaknesses

However, I find the following critical faults with the paper: - The baselines seem ill-defined. In the presented experiments, there is not a good way of knowing what constitutes a good delta in classification based on an unlearning request. In the results table, the authors show that after unlearning, the accuracy for the unlearned class is 0.0. I do not understand why there is any merit in this. Throughout the entire paper, there is never any mention of what constitutes a valid "forgetting" of

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

- This paper is well-written and easy for readers to understand, and its key idea is clear. - This paper deals with scenarios that could occur in the real world, such as situations where access to original data is not possible or where a classification model is trained on imbalanced data. These are plausible constraints, and the paper provides sufficient approaches to address them. - The authors conducted sufficient experiments to explain their algorithms and also conducted a thorough analysis

Weaknesses

- The authors train a generative model together with a classification model at the beginning. However, this can be a critical privacy issue because the generative model itself contains information about the forgetting data. Although this can be discarded after one unlearning process, it cannot be used for the next unlearning process. Therefore, it seems to be an architecture that cannot perform continuous unlearning. - The generated proxies are mentioned to be far from the decision boundary. In

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

This paper introduces a novel approach to address the challenging problem of imbalanced class unlearning with restricted data access. Unlike conventional retraining methods, the proposed special proxy generator method and in-batch tuning strategy offer a new perspective on efficiently unlearning from imbalanced data, particularly when forgetting data is predominantly composed of majority class samples. The paper's innovative use of a generative approach for proxy generation, as well as the integ

Weaknesses

There are clarity issues in the technical details and notation definitions. Some related work discussions also lack clarity in illustrating how the challenges posed by the problem studied in this paper specifically affect those approaches. Furthermore, the assumption of similar data volumes for all minority classes requires clarification. It remains unclear whether this assumption is driven by the criticality of the imbalance rate as a hyperparameter in this paper. These clarity issues collecti

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Coding and Health Information · Access Control and Trust

Methodstravel james · Focus