A Communication-Efficient Distributed Algorithm for Learning with Heterogeneous and Structurally Incomplete Multi-Site Data
Xiaokang Liu, Yuchen Yang, Yifei Sun, Jiang Bian, Yanyuan Ma, Raymond J. Carroll, Yong Chen

TL;DR
This paper introduces a communication-efficient distributed algorithm for biomedical data integration that effectively handles heterogeneity across sites and structural data missingness without sharing individual data, improving robustness.
Contribution
The paper presents a novel heterogeneity-aware distributed inference framework using density-tilted GMM, addressing data heterogeneity and structural missingness in multi-site biomedical studies.
Findings
Algorithm is communication-efficient and heterogeneity-aware.
Theoretical asymptotic properties are established.
Simulation studies validate the method's effectiveness.
Abstract
In multicenter biomedical research, integrating data from multiple decentralized sites provides more robust and generalizable findings due to its larger sample size and the ability to account for the between-site heterogeneity. However, sharing individual-level data across sites is often difficult due to patient privacy concerns and regulatory restrictions. To overcome this challenge, many distributed algorithms, that fit a global model by only communicating aggregated information across sites, have been proposed. A major challenge in applying existing distributed algorithms to real-world data is that their validity often relies on the assumption that data across sites are independently and identically distributed, which is frequently violated in practice. In biomedical applications, data distributions across clinical sites can be heterogeneous. Additionally, the set of covariates…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Statistical Methods and Inference · Machine Learning and Algorithms
