Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case

Alvaro Riquelme; Pedro Costa; Catalina Martinez

arXiv:2507.03067·cs.CL·July 8, 2025

Large Language Models for Automating Clinical Data Standardization: HL7 FHIR Use Case

Alvaro Riquelme, Pedro Costa, Catalina Martinez

PDF

TL;DR

This paper demonstrates that large language models like GPT-4o and Llama 3.2 can effectively automate the conversion of clinical data into HL7 FHIR format, improving interoperability in healthcare data exchange.

Contribution

It introduces a semi-automated, prompt-based method leveraging LLMs for clinical data standardization into HL7 FHIR, with high accuracy demonstrated on the MIMIC-IV dataset.

Findings

01

Resource identification achieved perfect F1-score with GPT-4o.

02

Accuracy remained high at 94% under real-world conditions.

03

Prompt refinement mitigated hallucinations and mismatches.

Abstract

For years, semantic interoperability standards have sought to streamline the exchange of clinical data, yet their deployment remains time-consuming, resource-intensive, and technically challenging. To address this, we introduce a semi-automated approach that leverages large language models specifically GPT-4o and Llama 3.2 405b to convert structured clinical datasets into HL7 FHIR format while assessing accuracy, reliability, and security. Applying our method to the MIMIC-IV database, we combined embedding techniques, clustering algorithms, and semantic retrieval to craft prompts that guide the models in mapping each tabular field to its corresponding FHIR resource. In an initial benchmark, resource identification achieved a perfect F1-score, with GPT-4o outperforming Llama 3.2 thanks to the inclusion of FHIR resource schemas within the prompt. Under real-world conditions, accuracy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.