TL;DR
This paper presents a federated learning framework for Random Forest classifiers that ensures data privacy, supports distributed training, and achieves competitive accuracy in healthcare applications, addressing a gap in privacy-preserving, interpretable machine learning.
Contribution
The paper introduces a novel federated Random Forest framework using PySyft, enabling privacy-preserving, distributed training with features like weighted averaging and incremental learning.
Findings
Maintains within 9% accuracy of centralized models
Supports privacy-preserving distributed training
Demonstrates effectiveness on healthcare benchmarks
Abstract
Privacy and regulatory barriers often hinder centralized machine learning solutions, particularly in sectors like healthcare where data cannot be freely shared. Federated learning has emerged as a powerful paradigm to address these concerns; however, existing frameworks primarily support gradient-based models, leaving a gap for more interpretable, tree-based approaches. This paper introduces a federated learning framework for Random Forest classifiers that preserves data privacy and provides robust performance in distributed settings. By leveraging PySyft for secure, privacy-aware computation, our method enables multiple institutions to collaboratively train Random Forest models on locally stored data without exposing sensitive information. The framework supports weighted model averaging to account for varying data distributions, incremental learning to progressively refine models, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
