Towards More Efficient Data Valuation in Healthcare Federated Learning using Ensembling
Sourav Kumar, A. Lakshminarayanan, Ken Chang, Feri Guretno, Ivan Ho, Mien, Jayashree Kalpathy-Cramer, Pavitra Krishnaswamy, Praveer Singh

TL;DR
This paper introduces SaFE, an efficient method for calculating Shapley values in healthcare federated learning, enabling accurate contribution assessment without prohibitive computational costs.
Contribution
The paper presents SaFE, a novel ensembling-based technique for precise Shapley value computation in healthcare federated learning, improving over existing approximation methods.
Findings
SaFE closely approximates exact Shapley values.
SaFE outperforms current approximation methods in accuracy.
Applicable in medical imaging with heterogeneous data.
Abstract
Federated Learning (FL) wherein multiple institutions collaboratively train a machine learning model without sharing data is becoming popular. Participating institutions might not contribute equally, some contribute more data, some better quality data or some more diverse data. To fairly rank the contribution of different institutions, Shapley value (SV) has emerged as the method of choice. Exact SV computation is impossibly expensive, especially when there are hundreds of contributors. Existing SV computation techniques use approximations. However, in healthcare where the number of contributing institutions are likely not of a colossal scale, computing exact SVs is still exorbitantly expensive, but not impossible. For such settings, we propose an efficient SV computation technique called SaFE (Shapley Value for Federated Learning using Ensembling). We empirically show that SaFE…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Statistical Methods and Inference · Radiomics and Machine Learning in Medical Imaging
