MAPLE: Elevating Medical Reasoning from Statistical Consensus to Process-Led Alignment
Kailong Fan, Anqi Pu, Yichen Wu, Wanhua Li, Yicong Li, Hanspeter Pfister, Huafeng Liu, Xiang Li, Quanzheng Li, and Ning Guo

TL;DR
This paper introduces a new training paradigm for medical large language models that replaces heuristic majority voting with expert-aligned rewards, significantly improving reasoning accuracy in complex medical scenarios.
Contribution
It proposes integrating medical process reward models with test-time reinforcement learning to guide models with medical correctness rather than consensus.
Findings
Outperforms existing TTRL and PRM methods on four benchmarks.
Demonstrates the importance of structured, step-wise rewards for medical AI.
Enhances model reliability and scalability in medical reasoning.
Abstract
Recent advances in medical large language models have explored Test-Time Reinforcement Learning (TTRL) to enhance reasoning. However, standard TTRL often relies on majority voting (MV) as a heuristic supervision signal, which can be unreliable in complex medical scenarios where the most frequent reasoning path is not necessarily the clinically correct one. In this work, we propose a novel and unified training paradigm that integrates medical process reward models with TTRL to bridge the gap between test-time scaling (TTS) and parametric model optimization. Specifically, we advance the TTRL framework by replacing the conventional MV with a fine-grained, expert-aligned supervision paradigm using Med-RPM. This integration ensures that reinforcement learning is guided by medical correctness rather than mere consensus, effectively distilling search-based intelligence into the model's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Artificial Intelligence in Healthcare and Education · Clinical Reasoning and Diagnostic Skills
