DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4

Zhengliang Liu; Yue Huang; Xiaowei Yu; Lu Zhang; Zihao Wu; Chao Cao; Haixing Dai; Lin Zhao; Yiwei Li; Peng Shu; Fang Zeng; Lichao Sun; Wei Liu; Dinggang Shen; Quanzheng Li; Tianming Liu; Dajiang Zhu; Xiang Li

arXiv:2303.11032·cs.CL·December 2, 2025·89 cites

DeID-GPT: Zero-shot Medical Text De-Identification by GPT-4

Zhengliang Liu, Yue Huang, Xiaowei Yu, Lu Zhang, Zihao Wu, Chao Cao, Haixing Dai, Lin Zhao, Yiwei Li, Peng Shu, Fang Zeng, Lichao Sun, Wei Liu, Dinggang Shen, Quanzheng Li, Tianming Liu, Dajiang Zhu, Xiang Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces DeID-GPT, a GPT-4-based framework for zero-shot de-identification of medical texts, achieving high accuracy in removing private information while maintaining text integrity.

Contribution

The study presents a novel GPT-4-powered approach for medical text de-identification that outperforms existing methods in accuracy and reliability without requiring fine-tuning.

Findings

01

DeID-GPT achieves the highest accuracy among tested methods.

02

It reliably masks private information in unstructured medical texts.

03

The approach preserves original text structure and meaning.

Abstract

The digitization of healthcare has facilitated the sharing and re-using of medical data but has also raised concerns about confidentiality and privacy. HIPAA (Health Insurance Portability and Accountability Act) mandates removing re-identifying information before the dissemination of medical records. Thus, effective and efficient solutions for de-identifying medical data, especially those in free-text forms, are highly needed. While various computer-assisted de-identification methods, including both rule-based and learning-based, have been developed and used in prior practice, such solutions still lack generalizability or need to be fine-tuned according to different scenarios, significantly imposing restrictions in wider use. The advancement of large language models (LLM), such as ChatGPT and GPT-4, have shown great potential in processing text data in the medical domain with zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yhydhx/chatgpt-api
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare

MethodsMulti-Head Attention · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Softmax · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections