TL;DR
TimberStrike is an optimization-based attack that reveals sensitive training data in federated tree-based models, exposing privacy vulnerabilities across multiple frameworks and datasets.
Contribution
This paper introduces TimberStrike, the first dataset reconstruction attack targeting federated tree-based models, demonstrating significant privacy risks and analyzing mitigation strategies.
Findings
Reconstructs 73-95% of training data in experiments
Vulnerable across multiple federated learning frameworks
Partial mitigation by Differential Privacy reduces data leakage
Abstract
Federated Learning has emerged as a privacy-oriented alternative to centralized Machine Learning, enabling collaborative model training without direct data sharing. While extensively studied for neural networks, the security and privacy implications of tree-based models remain underexplored. This work introduces TimberStrike, an optimization-based dataset reconstruction attack targeting horizontally federated tree-based models. Our attack, carried out by a single client, exploits the discrete nature of decision trees by using split values and decision paths to infer sensitive training data from other clients. We evaluate TimberStrike on State-of-the-Art federated gradient boosting implementations across multiple frameworks, including Flower, NVFlare, and FedTree, demonstrating their vulnerability to privacy breaches. On a publicly available stroke prediction dataset, TimberStrike…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
