MedS$^3$: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision

Shuyang Jiang; Yusheng Liao; Zhe Chen; Ya Zhang; Yanfeng Wang; Yu Wang

arXiv:2501.12051·cs.CL·November 26, 2025

MedS$^3$: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision

Shuyang Jiang, Yusheng Liao, Zhe Chen, Ya Zhang, Yanfeng Wang, Yu Wang

PDF

Open Access 1 Repo 2 Models 1 Datasets 1 Video

TL;DR

MedS3 is a novel self-evolving framework that enhances small medical language models with robust, fine-grained reasoning capabilities through reinforcement learning and a dual process reward system, improving accuracy and reasoning fidelity.

Contribution

The paper introduces MedS3, a self-evolving, rule-verifiable reasoning framework for medical language models, utilizing Monte Carlo Tree Search and a dual process reward model for improved clinical reasoning.

Findings

01

Outperforms previous state-of-the-art medical models by +6.45 accuracy points.

02

Surpasses 32B-scale general reasoning models by +8.57 points.

03

Achieves robust and faithful reasoning behavior in medical tasks.

Abstract

Medical language models face critical barriers to real-world clinical reasoning applications. However, mainstream efforts, which fall short in task coverage, lack fine-grained supervision for intermediate reasoning steps, and rely on proprietary systems, are still far from a versatile, credible and efficient language model for clinical reasoning usage. To this end, we propose MedS3, a self-evolving framework that imparts robust reasoning capabilities to small, deployable models. Starting with 8,000 curated instances sampled via a curriculum strategy across five medical domains and 16 datasets, we use a small base policy model to conduct Monte Carlo Tree Search (MCTS) for constructing rule-verifiable reasoning trajectories. Self-explored reasoning trajectories ranked by node values are used to bootstrap the policy model via reinforcement fine-tuning and preference learning. Moreover, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pixas/medsss
pytorchOfficial

Models

Datasets

pixas/MedSSS-data
dataset· 13 dl
13 dl

Videos

MedS³: Towards Medical Slow Thinking with Self-Evolved Soft Dual-sided Process Supervision· underline

Taxonomy

TopicsTopic Modeling

MethodsBalanced Selection