DynFrs: An Efficient Framework for Machine Unlearning in Random Forest
Shurong Wang, Zhuoyang Shen, Xinbao Qiao, Tongning Zhang, Meng Zhang

TL;DR
DynFrs is a novel framework that enables fast and accurate machine unlearning in Random Forests, addressing privacy regulations like GDPR with minimal impact on model performance.
Contribution
It introduces the DynFrs framework utilizing subsampling and lazy update strategies to efficiently perform machine unlearning in Random Forests, adaptable to various variants.
Findings
Achieves significantly faster unlearning performance.
Maintains or improves predictive accuracy.
Applicable to different Random Forest variants.
Abstract
Random Forests are widely recognized for establishing efficacy in classification and regression tasks, standing out in various domains such as medical diagnosis, finance, and personalized recommendations. These domains, however, are inherently sensitive to privacy concerns, as personal and confidential data are involved. With increasing demand for the right to be forgotten, particularly under regulations such as GDPR and CCPA, the ability to perform machine unlearning has become crucial for Random Forests. However, insufficient attention was paid to this topic, and existing approaches face difficulties in being applied to real-world scenarios. Addressing this gap, we propose the DynFrs framework designed to enable efficient machine unlearning in Random Forests while preserving predictive accuracy. Dynfrs leverages subsampling method Occ(q) and a lazy tag strategy Lzy, and is still…
Peer Reviews
Decision·ICLR 2025 Poster
* The method is simple yet effective, yielding an exact unlearning method for random forests. * The approach reduces required re-training from the source by limiting the number of trees each data point influences. * The method defers subtree re-computation until queries require that portion, making it suitable for online settings. * Theoretical proofs demonstrate the algorithm's exactness and time complexity. * Comprehensive experiments show superior performance compared to baselines, which suff
* Standard bootstrapping should be included as a baseline for performance comparison. The method involves two key changes: fixing the number of trees each sample is used in and using extremely randomized trees. Including separate comparisons would provide insights towards the impact of each modification. * Space complexity analysis is missing from the paper. While the authors mention what additional information is stored for each node in Section 4.3, an explicit discussion would be valuable. * T
1. The motivation and the contributions of this paper are clearly shown. 2. This paper provides both theoretical and experimental results to show the huge efficiency improvements compared with the retrained model.
1. This paper targets the unlearning in random forests. However, the proposed method relies on extremely randomized trees instead of the general decision trees, which limits the applicability. 2. The technical part, as well as the pseudo-code in the appendix, are hard to follow. An overall pipeline or workflow can be better used to understand the proposed method. 3. The experiment, especially the performance comparisons, lacks many essential results, and it cannot prove the effectiveness of t
This is a good paper that presents a significant contribution to the unlearning field. The novelty of this work stands in the proposed framework. Even though DYNFRS includes some techniques already presented in DaRE, in particular the random splits and the storage of the updated statistics of the nodes involved in the unlearning phase, the combination of these two techniques with constrained subsampling (OCC) and the tagging strategy (LZY) is novel and it has never been explored in the literat
Even though the proposal of the paper is good, the paper presents some weaknesses and unclear points that the authors should address. **Noteworthy weaknesses** The first point regards the presentation of the experimental methodology, which is not deepened and not clear. In Section 5.1.1., it is written that `For all baseline models, we adhere to the instructions provided in the original papers and use the same parameter settings.`. Thus, it is not guaranteed that the baselines are using the sa
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and Data Classification · Fault Detection and Control Systems
MethodsSoftmax · Attention Is All You Need
