Navigating open data sharing and privacy in the age of clinical AI research: from reidentification to pseudo-reidentification
Shahin Hallaj, Anna Heinke, Fritz Gerald P. Kalaw, Nayoon Gim, Marian Blazes, Julia Owen, Eamon Dysinger, Erik S. Benton, Benjamin A. Cordier, Nicholas G. Evans, Jennifer Li-Pook-Than, Michael P. Snyder, Camille Nebeker, Linda M. Zangwill, Sally L. Baxter, Shannon McWeeney

TL;DR
This paper discusses how AI challenges data privacy in clinical research and introduces a new approach to open data sharing through the AI-READI project.
Contribution
The paper introduces the concept of pseudo-reidentification and proposes a novel open data sharing approach for clinical AI research.
Findings
AI methods are increasing the risk of reidentifying de-identified clinical data.
The AI-READI project offers a new model for balancing data sharing and privacy in clinical research.
Pseudo-reidentification challenges traditional definitions of identifiable data.
Abstract
Sharing clinical research data is key for increasing the pace of medical discoveries that improve human health. However, concern about study participants' privacy, confidentiality, and safety is a major factor that deters researchers from openly sharing clinical data even after deidentification. This concern is further enhanced by the evolution of artificial intelligence (AI) approaches that pose an ever-increasing threat to the reidentification of study participants. Here, we discuss the challenges AI approaches create that are blurring the lines between identifiable, and non-identifiable data. We present a concept of pseudo-reidentification, and discuss how these challenges provide opportunities for rethinking open data sharing practices in clinical research. We highlight the novel open data sharing approach we have established as part of the AI-READI (Artificial Intelligence Ready,…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Ethics in Clinical Research · Artificial Intelligence in Healthcare and Education
