FHIRPath-QA: Executable Question Answering over FHIR Electronic Health Records
Michael Frew, Nishit Bheda, Bryan Tripp

TL;DR
This paper introduces FHIRPath-QA, a new dataset and benchmark for patient-specific question answering over electronic health records using FHIRPath queries, demonstrating that query synthesis improves efficiency and accuracy in clinical QA tasks.
Contribution
The work presents the first open dataset and benchmark for FHIRPath-based clinical question answering, proposing a text-to-FHIRPath paradigm that enhances efficiency and accuracy over traditional retrieval methods.
Findings
Supervised fine-tuning improves query synthesis accuracy from 27% to 79%.
Text-to-FHIRPath reduces token usage by 391x compared to retrieval-based prompting.
LLMs achieve up to 42% accuracy on the task, indicating room for improvement.
Abstract
Though patients are increasingly granted digital access to their electronic health records (EHRs), existing interfaces may not support precise, trustworthy answers to patient-specific questions. Large language models (LLM) show promise in clinical question answering (QA), but retrieval-based approaches are computationally inefficient, prone to hallucination, and difficult to deploy over real-life EHRs. This work introduces FHIRPath-QA, the first open dataset and benchmark for patient-specific QA that includes open-standard FHIRPath queries over real-world clinical data. A text-to-FHIRPath QA paradigm is proposed that shifts reasoning from free-text generation to FHIRPath query synthesis. For o4-mini, this reduced average token usage by 391x relative to retrieval-first prompting (629,829 vs 1,609 tokens per question) and lowered failure rates from 0.36 to 0.09 on clinician-phrased…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Topic Modeling · Multimodal Machine Learning Applications
