Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning
Hang Zhang, Ruheng Wang, Yuelyu Ji, Mingu Kwak, Xizhi Wu, Chenyu Li, Li Zhang, Wenqi Shi, Yifan Peng, Yanshan Wang

TL;DR
This paper introduces thod, a reinforcement learning framework that enhances medical reasoning verification by enabling iterative, tool-augmented evidence retrieval, significantly improving accuracy and reducing sampling costs.
Contribution
It presents a novel agentic framework that trains medical reasoning verifiers to iteratively query external medical data during evaluation, addressing limitations of existing reward models.
Findings
Improves MedQA accuracy by 23.5%
Enhances MedXpertQA accuracy by 32.0%
Reduces sampling budget by 8 times
Abstract
Large language models have achieved strong performance on medical reasoning benchmarks, yet their deployment in clinical settings demands rigorous verification to ensure factual accuracy. While reward models offer a scalable approach for reasoning trace verification, existing methods face two limitations: they produce only scalar reward values without explicit justification, and they rely on single-pass retrieval that precludes adaptive knowledge access as verification unfolds. We introduce , an agentic framework that addresses these limitations by training medical reasoning verifiers to iteratively query external medical corpora during evaluation. Our approach combines tool-augmented verification with an iterative reinforcement learning paradigm that requires only trace-level supervision, alongside an adaptive curriculum mechanism that dynamically adjusts training data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling
