Estimating Covariate-balanced Survival Curve in Distributed Data Environment using Data Collaboration Quasi-Experiment
Akihiro Toyoda, Yuji Kawamata, Tomoru Nakayama, Akira Imakura, Tetsuya Sakurai, Yukihiko Okada

TL;DR
This paper introduces a privacy-preserving framework for estimating covariate-balanced survival curves from distributed medical data, enabling collaboration without sharing raw data and outperforming single-site analyses.
Contribution
The proposed method allows covariate-adjusted survival analysis in distributed settings using low-dimensional data representations, without raw data exchange.
Findings
Outperforms single-site analyses in simulations and real datasets
Handles both horizontal and vertical data distributions
Ensures privacy with minimal communication
Abstract
The sharing of patient-level data necessary for covariate-adjusted survival analysis between medical institutions is difficult due to privacy protection restrictions. We propose a privacy-preserving framework that estimates balanced Kaplan-Meier curves from distributed observational data without exchanging raw data. Each institution sends only the low-dimensional representation obtained through dimensionality reduction of the covariate matrix. Analysts reconstruct the aggregated dataset, perform propensity score matching, and estimate survival curves. Experiments using simulation datasets and five publicly available medical datasets showed that the proposed method consistently outperformed single-site analyses. This method can handle both horizontal and vertical data distribution scenarios and enables the collaborative acquisition of reliable survival curves with minimal communication…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Causal Inference Techniques · Data Quality and Management
