A Robust Pipeline for Differentially Private Federated Learning on Imbalanced Clinical Data using SMOTETomek and FedProx
Rodrigo Tertulino

TL;DR
This paper develops a robust federated learning pipeline for imbalanced clinical data, combining SMOTETomek and FedProx to enhance privacy and utility in cardiovascular risk prediction.
Contribution
It introduces a novel combination of data balancing and optimization techniques to improve federated learning performance on imbalanced, non-IID medical datasets.
Findings
SMOTETomek effectively balances clinical data at client level.
Optimized FedProx outperforms FedAvg in non-IID settings.
High privacy (epsilon 9.0) can still yield recall >77%.
Abstract
Federated Learning (FL) presents a groundbreaking approach for collaborative health research, allowing model training on decentralized data while safeguarding patient privacy. FL offers formal security guarantees when combined with Differential Privacy (DP). The integration of these technologies, however, introduces a significant trade-off between privacy and clinical utility, a challenge further complicated by the severe class imbalance often present in medical datasets. The research presented herein addresses these interconnected issues through a systematic, multi-stage analysis. An FL framework was implemented for cardiovascular risk prediction, where initial experiments showed that standard methods struggled with imbalanced data, resulting in a recall of zero. To overcome such a limitation, we first integrated the hybrid Synthetic Minority Over-sampling Technique with Tomek Links…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
