Efficient Standardization of Clinical Notes using Large Language Models
Daniel B. Hier, Michael D. Carrithers, Thanh Son Do, Tayo, Obafemi-Ajayi

TL;DR
This paper introduces a large language model-based method to standardize clinical notes, improving their consistency, readability, and readiness for data extraction and interoperability in healthcare systems.
Contribution
The study presents a novel LLM approach for comprehensive clinical note standardization, addressing grammatical, spelling, terminology, abbreviations, and formatting inconsistencies.
Findings
Corrected an average of 4.9 grammatical errors per note
Expanded 15.8 abbreviations and acronyms per note
No significant data loss observed after standardization
Abstract
Clinician notes are a rich source of patient information but often contain inconsistencies due to varied writing styles, colloquialisms, abbreviations, medical jargon, grammatical errors, and non-standard formatting. These inconsistencies hinder the extraction of meaningful data from electronic health records (EHRs), posing challenges for quality improvement, population health, precision medicine, decision support, and research. We present a large language model approach to standardizing a corpus of 1,618 clinical notes. Standardization corrected an average of grammatical errors, spelling errors, converted non-standard terms to standard terminology, and expanded abbreviations and acronyms per note. Additionally, notes were re-organized into canonical sections with standardized headings. This process prepared notes for key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
