Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning

Hang Zhang; Ruheng Wang; Yuelyu Ji; Mingu Kwak; Xizhi Wu; Chenyu Li; Li Zhang; Wenqi Shi; Yifan Peng; Yanshan Wang

arXiv:2601.20221·cs.AI·January 29, 2026

Scaling Medical Reasoning Verification via Tool-Integrated Reinforcement Learning

Hang Zhang, Ruheng Wang, Yuelyu Ji, Mingu Kwak, Xizhi Wu, Chenyu Li, Li Zhang, Wenqi Shi, Yifan Peng, Yanshan Wang

PDF

Open Access

TL;DR

This paper introduces thod, a reinforcement learning framework that enhances medical reasoning verification by enabling iterative, tool-augmented evidence retrieval, significantly improving accuracy and reducing sampling costs.

Contribution

It presents a novel agentic framework that trains medical reasoning verifiers to iteratively query external medical data during evaluation, addressing limitations of existing reward models.

Findings

01

Improves MedQA accuracy by 23.5%

02

Enhances MedXpertQA accuracy by 32.0%

03

Reduces sampling budget by 8 times

Abstract

Large language models have achieved strong performance on medical reasoning benchmarks, yet their deployment in clinical settings demands rigorous verification to ensure factual accuracy. While reward models offer a scalable approach for reasoning trace verification, existing methods face two limitations: they produce only scalar reward values without explicit justification, and they rely on single-pass retrieval that precludes adaptive knowledge access as verification unfolds. We introduce $\method$ , an agentic framework that addresses these limitations by training medical reasoning verifiers to iteratively query external medical corpora during evaluation. Our approach combines tool-augmented verification with an iterative reinforcement learning paradigm that requires only trace-level supervision, alongside an adaptive curriculum mechanism that dynamically adjusts training data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Topic Modeling