Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning

Xiaotian Zhang; Yuan Wang; Zhaopeng Feng; Ruizhe Chen; Zhijie Zhou; Yan Zhang; Hongxia Xu; Jian Wu; Zuozhu Liu

arXiv:2506.12307·cs.CL·June 23, 2025

Med-U1: Incentivizing Unified Medical Reasoning in LLMs via Large-scale Reinforcement Learning

Xiaotian Zhang, Yuan Wang, Zhaopeng Feng, Ruizhe Chen, Zhijie Zhou, Yan Zhang, Hongxia Xu, Jian Wu, Zuozhu Liu

PDF

Open Access

TL;DR

Med-U1 introduces a unified reinforcement learning framework that enhances large language models' medical reasoning capabilities across diverse question-answering tasks, outperforming specialized models and generalizing well to new, unseen tasks.

Contribution

The paper presents Med-U1, a novel reinforcement learning approach that unifies medical QA tasks with diverse outputs, improving reasoning and performance of LLMs in medical domains.

Findings

01

Significant performance improvements on multiple Med-QA benchmarks.

02

Outperforms larger, specialized, and proprietary models.

03

Demonstrates strong generalization to out-of-distribution tasks.

Abstract

Medical Question-Answering (QA) encompasses a broad spectrum of tasks, including multiple choice questions (MCQ), open-ended text generation, and complex computational reasoning. Despite this variety, a unified framework for delivering high-quality medical QA has yet to emerge. Although recent progress in reasoning-augmented large language models (LLMs) has shown promise, their ability to achieve comprehensive medical understanding is still largely unexplored. In this paper, we present Med-U1, a unified framework for robust reasoning across medical QA tasks with diverse output formats, ranging from MCQs to complex generation and computation tasks. Med-U1 employs pure large-scale reinforcement learning with mixed rule-based binary reward functions, incorporating a length penalty to manage output verbosity. With multi-objective reward optimization, Med-U1 directs LLMs to produce concise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Machine Learning in Healthcare