A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing

Chenhao Liu; Zelin Wen; Yan Tong; Junjie Zhu; Xinyu Tian; Yuchi Liu; Ashu Gupta; Syed M. S. Islam; Tom Gedeon; Yue Yao

arXiv:2604.07128·cs.CV·April 9, 2026

A Utility-preserving De-identification Pipeline for Cross-hospital Radiology Data Sharing

Chenhao Liu, Zelin Wen, Yan Tong, Junjie Zhu, Xinyu Tian, Yuchi Liu, Ashu Gupta, Syed M. S. Islam, Tom Gedeon, Yue Yao

PDF

TL;DR

This paper presents a privacy-preserving de-identification pipeline for radiology data that maintains diagnostic utility for AI training and cross-hospital sharing.

Contribution

The authors introduce a novel utility-preserving de-identification pipeline combining text filtering and generative image synthesis for radiology data sharing.

Findings

01

Effective removal of privacy-sensitive information confirmed by reduced identity accuracy.

02

Models trained on de-identified data achieve comparable diagnostic accuracy to original data.

03

De-identified data enhances cross-hospital model performance when combined with local data.

Abstract

Large-scale radiology data are critical for developing robust medical AI systems. However, sharing such data across hospitals remains heavily constrained by privacy concerns. Existing de-identification research in radiology mainly focus on removing identifiable information to enable compliant data release. Yet whether de-identified radiology data can still preserve sufficient utility for large-scale vision-language model training and cross-hospital transfer remains underexplored. In this paper, we introduce a utility-preserving de-identification pipeline (UPDP) for cross-hospital radiology data sharing. Specifically, we compile a blacklist of privacy-sensitive terms and a whitelist of pathology-related terms. For radiology images, we use a generative filtering mechanism that synthesis a privacy-filtered and pathology-reserved counterparts of the original images. These synthetic image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.