MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment

Kailong Fan; Anqi Pu; Yichen Wu; Wanhua Li; Yicong Li; Hanspeter Pfister; Huafeng Liu; Xiang Li; Quanzheng Li; and Ning Guo

arXiv:2603.08987·cs.LG·March 11, 2026

MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment

Kailong Fan, Anqi Pu, Yichen Wu, Wanhua Li, Yicong Li, Hanspeter Pfister, Huafeng Liu, Xiang Li, Quanzheng Li, and Ning Guo

PDF

Open Access

TL;DR

This paper introduces a new training paradigm for medical large language models that replaces heuristic majority voting with expert-aligned rewards, significantly improving reasoning accuracy in complex medical scenarios.

Contribution

It proposes integrating medical process reward models with test-time reinforcement learning to guide models with medical correctness rather than consensus.

Findings

01

Outperforms existing TTRL and PRM methods on four benchmarks.

02

Demonstrates the importance of structured, step-wise rewards for medical AI.

03

Enhances model reliability and scalability in medical reasoning.

Abstract

Recent advances in medical large language models have explored Test-Time Reinforcement Learning (TTRL) to enhance reasoning. However, standard TTRL often relies on majority voting (MV) as a heuristic supervision signal, which can be unreliable in complex medical scenarios where the most frequent reasoning path is not necessarily the clinically correct one. In this work, we propose a novel and unified training paradigm that integrates medical process reward models with TTRL to bridge the gap between test-time scaling (TTS) and parametric model optimization. Specifically, we advance the TTRL framework by replacing the conventional MV with a fine-grained, expert-aligned supervision paradigm using Med-RPM. This integration ensures that reinforcement learning is guided by medical correctness rather than mere consensus, effectively distilling search-based intelligence into the model's…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Clinical Reasoning and Diagnostic Skills