Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach

Haolin Li; Shuyang Jiang; Ruipeng Zhang; Jiangchao Yao; Ya Zhang; Yanfeng Wang

arXiv:2604.11547·cs.LG·April 14, 2026

Eliciting Medical Reasoning with Knowledge-enhanced Data Synthesis: A Semi-Supervised Reinforcement Learning Approach

Haolin Li, Shuyang Jiang, Ruipeng Zhang, Jiangchao Yao, Ya Zhang, Yanfeng Wang

PDF

1 Repo 1 Models 2 Datasets

TL;DR

MedSSR is a semi-supervised reinforcement learning framework that synthesizes medical reasoning data using knowledge and pseudo-labeling, significantly improving performance on rare disease benchmarks without costly trace distillation.

Contribution

It introduces a novel two-stage training paradigm combining synthetic data generation with pseudo-labeling for efficient medical reasoning enhancement.

Findings

01

Outperforms existing methods on ten medical benchmarks.

02

Achieves up to +5.93% gain on rare-disease tasks.

03

Scales training efficiently without costly trace distillation.

Abstract

While large language models hold promise for complex medical applications, their development is hindered by the scarcity of high-quality reasoning data. To address this issue, existing approaches typically distill chain-of-thought reasoning traces from large proprietary models via supervised fine-tuning, then conduct reinforcement learning (RL). These methods exhibit limited improvement on underrepresented domains like rare diseases while incurring substantial costs from generating complex reasoning chains. To efficiently enhance medical reasoning, we propose MedSSR, a Medical Knowledge-enhanced data Synthesis and Semi-supervised Reinforcement learning framework. Our framework first employs rare disease knowledge to synthesize distribution-controllable reasoning questions. We then utilize the policy model itself to generate high-quality pseudo-labels. This enables a two-stage,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tdlhl/MedSSR
github

Models

🤗
tdlhl/MedSSR-Qwen3-8B-Base
model· 11 dl· ♡ 1
11 dl♡ 1

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.