Reinforcement Learning for Tool-Calling Agents in Fast Healthcare Interoperability Resources (FHIR)
Marius S. Knorr, Robert M\"uller, Jan P. Bremer, Nils Schweingruber

TL;DR
This paper introduces a reinforcement learning approach to improve multi-step reasoning in healthcare data querying agents over FHIR, enhancing accuracy and data integrity.
Contribution
It presents a novel RL-based training pipeline for FHIR agents, significantly improving reasoning accuracy over prior prompt-based methods.
Findings
Answer correctness improved from 50% to 77% on FHIR-AgentBench.
RL post-training enforces data integrity constraints.
Approach works with smaller, cost-effective models like Qwen3-8B.
Abstract
Fast Healthcare Interoperability Resources (FHIR) is the dominant standard for interoperable exchange of healthcare data. In FHIR, electronic health records form a directed graph of resources. Answering clinically meaningful questions over FHIR requires agents to perform multi-step reasoning, filtering, and aggregation across multiple resource types. Prior work shows that even tool-augmented LLM agents (retrieval, code execution, multi-turn planning) often select the wrong resources or violate traversal constraints. We study this problem in the context of FHIR-AgentBench, a benchmark for realistic question answering over real-world hospital data, and frame reasoning on FHIR as a sequential decision-making problem over a queryable structured graph. We implement a multi-turn CodeAct agent and post-train it with reinforcement learning using a custom harness and tools. A LLM Judge provides…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
