FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records

Michael Frew; Nishit Bheda; Bryan Tripp

arXiv:2602.23479·cs.CL·March 26, 2026

FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records

Michael Frew, Nishit Bheda, Bryan Tripp

PDF

Open Access

TL;DR

This paper introduces FHIRPath-QA, a new dataset and benchmark for patient-specific question answering over electronic health records using FHIRPath queries, demonstrating that query synthesis improves efficiency and accuracy in clinical QA tasks.

Contribution

The work presents the first open dataset and benchmark for FHIRPath-based clinical question answering, proposing a text-to-FHIRPath paradigm that enhances efficiency and accuracy over traditional retrieval methods.

Findings

01

Supervised fine-tuning improves query synthesis accuracy from 27% to 79%.

02

Text-to-FHIRPath reduces token usage by 391x compared to retrieval-based prompting.

03

LLMs achieve up to 42% accuracy on the task, indicating room for improvement.

Abstract

Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA), but retrieval-based approaches are computationally inefficient, prone to hallucination, and difficult to deploy over real-life EHRs. This work introduces FHIRPath-QA, the first open dataset and benchmark for patient-specific QA that includes open-standard FHIRPath queries over real-world clinical data. A text-to-FHIRPath QA paradigm is proposed that shifts reasoning from free-text generation to FHIRPath query synthesis. For o4-mini, this reduced average token usage by 391x relative to retrieval-first prompting (629,829 vs 1,609 tokens per question) and lowered failure rates from 0.36 to 0.09 on clinician-phrased…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Topic Modeling · Multimodal Machine Learning Applications