Privacy-Preserving Methods for Vertically Partitioned Incomplete Data
Yi Deng, Xiaoqian Jiang, Qi Long

TL;DR
This paper introduces a privacy-preserving distributed framework for analyzing vertically partitioned incomplete health data, enabling collaborative analysis without sharing individual-level data, and demonstrates its effectiveness through simulations and real data.
Contribution
It proposes a novel distributed analysis method that preserves privacy while effectively handling missing data in vertically partitioned datasets.
Findings
Methods perform as well as pooled data analysis in simulations
Outperform naive approaches in handling missing data
Effective on real-world health dataset
Abstract
Distributed health data networks that use information from multiple sources have drawn substantial interest in recent years. However, missing data are prevalent in such networks and present significant analytical challenges. The current state-of-the-art methods for handling missing data require pooling data into a central repository before analysis, which may not be possible in a distributed health data network. In this paper, we propose a privacy-preserving distributed analysis framework for handling missing data when data are vertically partitioned. In this framework, each institution with a particular data source utilizes the local private data to calculate necessary intermediate aggregated statistics, which are then shared to build a global model for handling missing data. To evaluate our proposed methods, we conduct simulation studies that clearly demonstrate that the proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data Quality and Management · Data-Driven Disease Surveillance
