Patient Clustering Improves Efficiency of Federated Machine Learning to predict mortality and hospital stay time using distributed Electronic Medical Records
Li Huang, Dianbo Liu

TL;DR
This paper introduces a community-based federated learning algorithm that clusters distributed electronic medical records into meaningful groups, improving predictive accuracy and reducing communication costs while preserving data privacy.
Contribution
The novel CBFL algorithm effectively clusters non-IID EMRs into clinically meaningful communities for improved federated learning performance.
Findings
CBFL outperforms baseline FL in ROC AUC and PR AUC.
Communities' performance correlates with their dissimilarity to others.
Data remains local, enhancing privacy and reducing communication costs.
Abstract
Electronic medical records (EMRs) supports the development of machine learning algorithms for predicting disease incidence, patient response to treatment, and other healthcare events. But insofar most algorithms have been centralized, taking little account of the decentralized, non-identically independently distributed (non-IID), and privacy-sensitive characteristics of EMRs that can complicate data collection, sharing and learning. To address this challenge, we introduced a community-based federated machine learning (CBFL) algorithm and evaluated it on non-IID ICU EMRs. Our algorithm clustered the distributed data into clinically meaningful communities that captured similar diagnoses and geological locations, and learnt one model for each community. Throughout the learning process, the data was kept local on hospitals, while locally-computed results were aggregated on a server.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare · Privacy-Preserving Technologies in Data
