Distributed Kaplan-Meier Analysis via the Influence Function with Application to COVID-19 and COVID-19 Vaccine Adverse Events
Malcolm Risk, Xu Shi, Lili Zhao

TL;DR
This paper introduces a distributed learning method for constructing Kaplan-Meier curves using the influence function, enabling multi-center survival analysis without sharing individual data, demonstrated on COVID-19 outcomes.
Contribution
The paper develops a novel distributed estimator for Kaplan-Meier curves that maintains statistical efficiency and privacy in multi-center studies, applied to COVID-19 data.
Findings
Higher incidence of blood clots after COVID-19 infection compared to vaccination.
Distributed estimator is unbiased and as efficient as pooled data methods.
Method enables privacy-preserving, timely survival analysis across multiple centers.
Abstract
During the COVID-19 pandemic, regulatory decision-making was hampered by a lack of timely and high-quality data on rare outcomes. Studying rare outcomes following infection and vaccination requires conducting multi-center observational studies, where sharing individual-level data is a privacy concern. In this paper, we conduct a multi-center observational study of thromboembolic events following COVID-19 and COVID-19 vaccination without sharing individual-level data. We accomplish this by developing a novel distributed learning method for constructing Kaplan-Meier (KM) curves and inverse propensity weighted KM curves with statistical inference. We sequentially update curves site-by-site using the KM influence function, which is a measure of the direction in which an observation should shift our estimate and so can be used to incorporate new observations without access to previous data.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
