An Analysis Of Protected Health Information Leakage In Deep-Learning   Based De-Identification Algorithms

Salman Seyedi; Li Xiong; Shamim Nemati; Gari D. Clifford

arXiv:2101.12099·cs.LG·May 24, 2021·1 cites

An Analysis Of Protected Health Information Leakage In Deep-Learning Based De-Identification Algorithms

Salman Seyedi, Li Xiong, Shamim Nemati, Gari D. Clifford

PDF

Open Access

TL;DR

This study investigates whether a state-of-the-art deep learning de-identification model leaks protected health information, finding that current models do not show strong evidence of individual data membership inference, suggesting safety in data sharing.

Contribution

The paper provides an empirical analysis of privacy risks in a deep learning-based medical de-identification model, which has not been extensively studied before.

Findings

01

Model output does not reveal training data membership.

02

Membership inference attacks were unsuccessful.

03

No empirical evidence of individual identification in training data.

Abstract

The increasing complexity of algorithms for analyzing medical data, including de-identification tasks, raises the possibility that complex algorithms are learning not just the general representation of the problem, but specifics of given individuals within the data. Modern legal frameworks specifically prohibit the intentional or accidental distribution of patient data, but have not addressed this potential avenue for leakage of such protected health information. Modern deep learning algorithms have the highest potential of such leakage due to complexity of the models. Recent research in the field has highlighted such issues in non-medical data, but all analysis is likely to be data and algorithm specific. We, therefore, chose to analyze a state-of-the-art free-text de-identification algorithm based on LSTM (Long Short-Term Memory) and its potential in encoding any individual in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Internet Traffic Analysis and Secure E-voting

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory