An Analytical Approach to Privacy and Performance Trade-Offs in Healthcare Data Sharing
Yusi Wei, Hande Y. Benson, and Muge Capan

TL;DR
This paper evaluates privacy-preserving data anonymization methods in healthcare, balancing patient privacy with the utility of machine learning models, especially for vulnerable populations, using real hospital data.
Contribution
It compares three anonymization techniques, demonstrating MO-OBAM's superior utility preservation while enhancing privacy protection in healthcare data sharing.
Findings
k-anonymity offers limited privacy protection
Zheng et al.'s method and MO-OBAM provide stronger safeguards
MO-OBAM maintains ML model performance with minimal utility loss
Abstract
The secondary use of healthcare data is vital for research and clinical innovation, but it raises concerns about patient privacy. This study investigates how to balance privacy preservation and data utility in healthcare data sharing, considering the perspectives of both data providers and data users. Using a dataset of adult patients hospitalized between 2013 and 2015, we predict whether sepsis was present at admission or developed during the hospital stay. We identify sub-populations, such as older adults, frequently hospitalized patients, and racial minorities, that are especially vulnerable to privacy attacks due to their unique combinations of demographic and healthcare utilization attributes. These groups are also critical for machine learning (ML) model performance. We evaluate three anonymization methods--anonymity, the technique by Zheng et al., and the MO-OBAM model-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
