Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case
Alvaro Riquelme, Pedro Costa, Catalina Martinez

TL;DR
This paper demonstrates that large language models like GPT-4o and Llama 3.2 can effectively automate the conversion of clinical data into HL7 FHIR format, improving interoperability in healthcare data exchange.
Contribution
It introduces a semi-automated, prompt-based method leveraging LLMs for clinical data standardization into HL7 FHIR, with high accuracy demonstrated on the MIMIC-IV dataset.
Findings
Resource identification achieved perfect F1-score with GPT-4o.
Accuracy remained high at 94% under real-world conditions.
Prompt refinement mitigated hallucinations and mismatches.
Abstract
For years, semantic interoperability standards have sought to streamline the exchange of clinical data, yet their deployment remains time-consuming, resource-intensive, and technically challenging. To address this, we introduce a semi-automated approach that leverages large language models specifically GPT-4o and Llama 3.2 405b to convert structured clinical datasets into HL7 FHIR format while assessing accuracy, reliability, and security. Applying our method to the MIMIC-IV database, we combined embedding techniques, clustering algorithms, and semantic retrieval to craft prompts that guide the models in mapping each tabular field to its corresponding FHIR resource. In an initial benchmark, resource identification achieved a perfect F1-score, with GPT-4o outperforming Llama 3.2 thanks to the inclusion of FHIR resource schemas within the prompt. Under real-world conditions, accuracy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
