ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding

Tuan-Dung Le; Shohreh Haddadan; Thanh Q. Thieu

arXiv:2511.07311·cs.CL·November 11, 2025

ACE-ICD: Acronym Expansion As Data Augmentation For Automated ICD Coding

Tuan-Dung Le, Shohreh Haddadan, Thanh Q. Thieu

PDF

Open Access

TL;DR

This paper introduces ACE-ICD, a data augmentation method using large language models to expand medical acronyms in clinical notes, significantly improving automated ICD coding accuracy.

Contribution

It presents a novel acronym expansion technique combined with consistency training, achieving state-of-the-art results in ICD coding tasks.

Findings

01

Improved accuracy on MIMIC-III dataset

02

Enhanced performance on rare and common codes

03

State-of-the-art results across multiple settings

Abstract

Automatic ICD coding, the task of assigning disease and procedure codes to electronic medical records, is crucial for clinical documentation and billing. While existing methods primarily enhance model understanding of code hierarchies and synonyms, they often overlook the pervasive use of medical acronyms in clinical notes, a key factor in ICD code inference. To address this gap, we propose a novel effective data augmentation technique that leverages large language models to expand medical acronyms, allowing models to be trained on their full form representations. Moreover, we incorporate consistency training to regularize predictions by enforcing agreement between the original and augmented documents. Extensive experiments on the MIMIC-III dataset demonstrate that our approach, ACE-ICD establishes new state-of-the-art performance across multiple settings, including common codes, rare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBiomedical Text Mining and Ontologies · Machine Learning in Healthcare · Topic Modeling