TL;DR
This paper demonstrates that deep learning models can re-identify patients from chest X-ray images, posing privacy risks despite anonymization efforts, with high accuracy across multiple datasets and long time spans.
Contribution
It is the first to show that deep learning can reliably re-identify patients from chest X-ray data, revealing privacy vulnerabilities in medical datasets.
Findings
Achieved 95.55% accuracy in patient identification
Demonstrated re-identification over ten years apart
Validated results on external datasets like CheXpert and COVID-19 images
Abstract
With the rise and ever-increasing potential of deep learning techniques in recent years, publicly available medical datasets became a key factor to enable reproducible development of diagnostic algorithms in the medical domain. Medical data contains sensitive patient-related information and is therefore usually anonymized by removing patient identifiers, e.g., patient names before publication. To the best of our knowledge, we are the first to show that a well-trained deep learning system is able to recover the patient identity from chest X-ray data. We demonstrate this using the publicly available large-scale ChestX-ray14 dataset, a collection of 112,120 frontal-view chest X-ray images from 30,805 unique patients. Our verification system is able to identify whether two frontal chest X-ray images are from the same person with an AUC of 0.9940 and a classification accuracy of 95.55%. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
