DynFrs: An Efficient Framework for Machine Unlearning in Random Forest

Shurong Wang; Zhuoyang Shen; Xinbao Qiao; Tongning Zhang; Meng Zhang

arXiv:2410.01588·cs.LG·March 20, 2025

DynFrs: An Efficient Framework for Machine Unlearning in Random Forest

Shurong Wang, Zhuoyang Shen, Xinbao Qiao, Tongning Zhang, Meng Zhang

PDF

Open Access 1 Repo 3 Reviews

TL;DR

DynFrs is a novel framework that enables fast and accurate machine unlearning in Random Forests, addressing privacy regulations like GDPR with minimal impact on model performance.

Contribution

It introduces the DynFrs framework utilizing subsampling and lazy update strategies to efficiently perform machine unlearning in Random Forests, adaptable to various variants.

Findings

01

Achieves significantly faster unlearning performance.

02

Maintains or improves predictive accuracy.

03

Applicable to different Random Forest variants.

Abstract

Random Forests are widely recognized for establishing efficacy in classification and regression tasks, standing out in various domains such as medical diagnosis, finance, and personalized recommendations. These domains, however, are inherently sensitive to privacy concerns, as personal and confidential data are involved. With increasing demand for the right to be forgotten, particularly under regulations such as GDPR and CCPA, the ability to perform machine unlearning has become crucial for Random Forests. However, insufficient attention was paid to this topic, and existing approaches face difficulties in being applied to real-world scenarios. Addressing this gap, we propose the DynFrs framework designed to enable efficient machine unlearning in Random Forests while preserving predictive accuracy. Dynfrs leverages subsampling method Occ(q) and a lazy tag strategy Lzy, and is still…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

* The method is simple yet effective, yielding an exact unlearning method for random forests. * The approach reduces required re-training from the source by limiting the number of trees each data point influences. * The method defers subtree re-computation until queries require that portion, making it suitable for online settings. * Theoretical proofs demonstrate the algorithm's exactness and time complexity. * Comprehensive experiments show superior performance compared to baselines, which suff

Weaknesses

* Standard bootstrapping should be included as a baseline for performance comparison. The method involves two key changes: fixing the number of trees each sample is used in and using extremely randomized trees. Including separate comparisons would provide insights towards the impact of each modification. * Space complexity analysis is missing from the paper. While the authors mention what additional information is stored for each node in Section 4.3, an explicit discussion would be valuable. * T

Reviewer 02Rating 5Confidence 2

Strengths

1. The motivation and the contributions of this paper are clearly shown. 2. This paper provides both theoretical and experimental results to show the huge efficiency improvements compared with the retrained model.

Weaknesses

1. This paper targets the unlearning in random forests. However, the proposed method relies on extremely randomized trees instead of the general decision trees, which limits the applicability. 2. The technical part, as well as the pseudo-code in the appendix, are hard to follow. An overall pipeline or workflow can be better used to understand the proposed method. 3. The experiment, especially the performance comparisons, lacks many essential results, and it cannot prove the effectiveness of t

Reviewer 03Rating 8Confidence 4

Strengths

This is a good paper that presents a significant contribution to the unlearning field. The novelty of this work stands in the proposed framework. Even though DYNFRS includes some techniques already presented in DaRE, in particular the random splits and the storage of the updated statistics of the nodes involved in the unlearning phase, the combination of these two techniques with constrained subsampling (OCC) and the tagging strategy (LZY) is novel and it has never been explored in the literat

Weaknesses

Even though the proposal of the paper is good, the paper presents some weaknesses and unclear points that the authors should address. **Noteworthy weaknesses** The first point regards the presentation of the experimental methodology, which is not deepened and not clear. In Section 5.1.1., it is written that `For all baseline models, we adhere to the instructions provided in the original papers and use the same parameter settings.`. Thus, it is not guaranteed that the baselines are using the sa

Code & Models

Repositories

shurongwang/DynFrs
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and Data Classification · Fault Detection and Control Systems

MethodsSoftmax · Attention Is All You Need