An Analysis Of Protected Health Information Leakage In Deep-Learning Based De-Identification Algorithms
Salman Seyedi, Li Xiong, Shamim Nemati, Gari D. Clifford

TL;DR
This study investigates whether a state-of-the-art deep learning de-identification model leaks protected health information, finding that current models do not show strong evidence of individual data membership inference, suggesting safety in data sharing.
Contribution
The paper provides an empirical analysis of privacy risks in a deep learning-based medical de-identification model, which has not been extensively studied before.
Findings
Model output does not reveal training data membership.
Membership inference attacks were unsuccessful.
No empirical evidence of individual identification in training data.
Abstract
The increasing complexity of algorithms for analyzing medical data, including de-identification tasks, raises the possibility that complex algorithms are learning not just the general representation of the problem, but specifics of given individuals within the data. Modern legal frameworks specifically prohibit the intentional or accidental distribution of patient data, but have not addressed this potential avenue for leakage of such protected health information. Modern deep learning algorithms have the highest potential of such leakage due to complexity of the models. Recent research in the field has highlighted such issues in non-medical data, but all analysis is likely to be data and algorithm specific. We, therefore, chose to analyze a state-of-the-art free-text de-identification algorithm based on LSTM (Long Short-Term Memory) and its potential in encoding any individual in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Internet Traffic Analysis and Secure E-voting
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
