Principled Federated Random Forests for Heterogeneous Data
R\'emi Khellaf, Erwan Scornet, Aur\'elien Bellet, Julie Josse

TL;DR
FedForest is a novel federated random forest algorithm that effectively handles data heterogeneity, closely approximates centralized performance, and introduces client-based splits for personalization, all while maintaining communication efficiency.
Contribution
The paper introduces FedForest, a principled federated random forest method that accommodates diverse data heterogeneity and enables personalization through client indicator splits.
Findings
FedForest closely matches centralized performance on heterogeneous benchmarks.
The splitting procedure approximates centralized split selection using client statistics.
FedForest remains communication-efficient while handling complex data shifts.
Abstract
Random Forests (RF) are among the most powerful and widely used predictive models for centralized tabular data, yet few methods exist to adapt them to the federated learning setting. Unlike most federated learning approaches, the piecewise-constant nature of RF prevents exact gradient-based optimization. As a result, existing federated RF implementations rely on unprincipled heuristics: for instance, aggregating decision trees trained independently on clients fails to optimize the global impurity criterion, even under simple distribution shifts. We propose FedForest, a new federated RF algorithm for horizontally partitioned data that naturally accommodates diverse forms of client data heterogeneity, from covariate shift to more complex outcome shift mechanisms. We prove that our splitting procedure, based on aggregating carefully chosen client statistics, closely approximates the split…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
