Software for Distributed Computation on Medical Databases: A Demonstration Project
Balasubramanian Narasimhan, Daniel L. Rubin, Samuel M. Gross, Marina, Bendersky, Philip W. Lavori

TL;DR
This paper presents software tools built on R that facilitate distributed computation on medical databases, enabling collaborative modeling while respecting privacy and data heterogeneity.
Contribution
The paper introduces a flexible, open-source software framework for distributed statistical modeling across heterogeneous medical databases, supporting privacy and collaboration.
Findings
Successful implementation of site-stratified Cox model.
Effective distributed computation of rank-k SVD.
Software supports heterogeneous database environments.
Abstract
Bringing together the information latent in distributed medical databases promises to personalize medical care by enabling reliable, stable modeling of outcomes with rich feature sets (including patient characteristics and treatments received). However, there are barriers to aggregation of medical data, due to lack of standardization of ontologies, privacy concerns, proprietary attitudes toward data, and a reluctance to give up control over end use. Aggregation of data is not always necessary for model fitting. In models based on maximizing a likelihood, the computations can be distributed, with aggregation limited to the intermediate results of calculations on local data, rather than raw data. Distributed fitting is also possible for singular value decomposition. There has been work on the technical aspects of shared computation for particular applications, but little has been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
