Federated Random Forest for Partially Overlapping Clinical Data

Youngjun Park; Cord Eric Schmidt; Benedikt Marcel Batton,; Anne-Christin Hauschild

arXiv:2405.20738·cs.LG·June 3, 2024

Federated Random Forest for Partially Overlapping Clinical Data

Youngjun Park, Cord Eric Schmidt, Benedikt Marcel Batton,, Anne-Christin Hauschild

PDF

Open Access

TL;DR

This paper develops and evaluates a federated random forest approach tailored for clinical datasets with partially overlapping features, addressing privacy and heterogeneity challenges in healthcare data analysis.

Contribution

It introduces a novel federated random forest model that handles partially overlapping features, improving collaborative analysis of heterogeneous clinical data.

Findings

01

Federated random forests outperform local models in partially overlapping data scenarios.

02

The approach maintains high accuracy even with limited feature overlap.

03

Effective across datasets with class imbalance.

Abstract

In the healthcare sector, a consciousness surrounding data privacy and corresponding data protection regulations, as well as heterogeneous and non-harmonized data, pose huge challenges to large-scale data analysis. Moreover, clinical data often involves partially overlapping features, as some observations may be missing due to various reasons, such as differences in procedures, diagnostic tests, or other recorded patient history information across hospitals or institutes. To address the challenges posed by partially overlapping features and incomplete data in clinical datasets, a comprehensive approach is required. Particularly in the domain of medical data, promising outcomes are achieved by federated random forests whenever features align. However, for most standard algorithms, like random forest, it is essential that all data sets have identical parameters. Therefore, in this work…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data