LLMs-in-the-Loop Part 2: Expert Small AI Models for Anonymization and   De-identification of PHI Across Multiple Languages

Murat Gunay; Bunyamin Keles; Raife Hizlan

arXiv:2412.10918·cs.CL·December 17, 2024

LLMs-in-the-Loop Part 2: Expert Small AI Models for Anonymization and De-identification of PHI Across Multiple Languages

Murat Gunay, Bunyamin Keles, Raife Hizlan

PDF

Open Access

TL;DR

This paper presents expert small AI models for multilingual PHI de-identification that outperform large language models, ensuring privacy and reliability in healthcare data processing with high accuracy across eight languages.

Contribution

It introduces domain-specific small NER models using LLM-in-the-loop methodology, achieving superior performance and privacy advantages over existing large models.

Findings

01

Achieved high F1 scores (around 0.95-0.98) across eight languages.

02

Outperformed GPT-4 and other small models in de-identification tasks.

03

Demonstrated cost-effective, privacy-preserving healthcare data anonymization.

Abstract

The rise of chronic diseases and pandemics like COVID-19 has emphasized the need for effective patient data processing while ensuring privacy through anonymization and de-identification of protected health information (PHI). Anonymized data facilitates research without compromising patient confidentiality. This paper introduces expert small AI models developed using the LLM-in-the-loop methodology to meet the demand for domain-specific de-identification NER models. These models overcome the privacy risks associated with large language models (LLMs) used via APIs by eliminating the need to transmit or store sensitive data. More importantly, they consistently outperform LLMs in de-identification tasks, offering superior performance and reliability. Our de-identification NER models, developed in eight languages (English, German, Italian, French, Romanian, Turkish, Spanish, and Arabic)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling