Trained Random Forests Completely Reveal your Dataset
Julien Ferry, Ricardo Fukasawa, Timoth\'ee Pascal, Thibaut Vidal

TL;DR
This paper presents a novel optimization-based attack that can fully or nearly fully reconstruct datasets used to train random forests, exposing significant privacy vulnerabilities in widely used ensemble methods.
Contribution
It introduces a new reconstruction attack leveraging constraint programming, demonstrating its effectiveness on common random forest configurations and highlighting privacy risks.
Findings
Random forests without bootstrap are highly susceptible to reconstruction.
Even with bootstrap, most data can be reconstructed.
The attack is practical and exploits readily available library information.
Abstract
We introduce an optimization-based reconstruction attack capable of completely or near-completely reconstructing a dataset utilized for training a random forest. Notably, our approach relies solely on information readily available in commonly used libraries such as scikit-learn. To achieve this, we formulate the reconstruction problem as a combinatorial problem under a maximum likelihood objective. We demonstrate that this problem is NP-hard, though solvable at scale using constraint programming -- an approach rooted in constraint propagation and solution-domain reduction. Through an extensive computational investigation, we demonstrate that random forests trained without bootstrap aggregation but with feature randomization are susceptible to a complete reconstruction. This holds true even with a small number of trees. Even with bootstrap aggregation, the majority of the data can also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification
