MediX-R1: Open Ended Medical Reinforcement Learning

Sahal Shaji Mullappilly; Mohammed Irfan Kurpath; Omair Mohamed; Mohamed Zidan; Fahad Khan; Salman Khan; Rao Anwer; Hisham Cholakkal

arXiv:2602.23363·cs.CV·February 27, 2026

MediX-R1: Open Ended Medical Reinforcement Learning

Sahal Shaji Mullappilly, Mohammed Irfan Kurpath, Omair Mohamed, Mohamed Zidan, Fahad Khan, Salman Khan, Rao Anwer, Hisham Cholakkal

PDF

Open Access 6 Models 1 Datasets

TL;DR

MediX-R1 introduces a novel open-ended reinforcement learning framework for medical multimodal large language models, enabling clinically grounded, free-form reasoning and surpassing existing baselines with a comprehensive reward and evaluation system.

Contribution

It presents a new RL approach with multi-signal rewards and LLM-based evaluation for medical multimodal models, improving open-ended reasoning capabilities.

Findings

01

Outperforms strong open-source medical LLM and VLM benchmarks.

02

Achieves significant gains on open-ended clinical tasks.

03

Effective with only ~51K instruction examples.

Abstract

We introduce MediX-R1, an open-ended Reinforcement Learning (RL) framework for medical multimodal large language models (MLLMs) that enables clinically grounded, free-form answers beyond multiple-choice formats. MediX-R1 fine-tunes a baseline vision-language backbone with Group Based RL and a composite reward tailored for medical reasoning: an LLM-based accuracy reward that judges semantic correctness with a strict YES/NO decision, a medical embedding-based semantic reward to capture paraphrases and terminology variants, and lightweight format and modality rewards that enforce interpretable reasoning and modality recognition. This multi-signal design provides stable, informative feedback for open-ended outputs where traditional verifiable or MCQ-only rewards fall short. To measure progress, we propose a unified evaluation framework for both text-only and image+text tasks that uses a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

MBZUAI/medix-rl-data
dataset· 668 dl
668 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Artificial Intelligence in Healthcare and Education · Machine Learning in Healthcare