When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Ahmadreza Jeddi; Kimia Shaban; Negin Baghbanzadeh; Natasha Sharan; Abhishek Moturu; Elham Dolatabadi; Babak Taati

arXiv:2603.01301·cs.CV·March 3, 2026

When Does RL Help Medical VLMs? Disentangling Vision, SFT, and RL Gains

Ahmadreza Jeddi, Kimia Shaban, Negin Baghbanzadeh, Natasha Sharan, Abhishek Moturu, Elham Dolatabadi, Babak Taati

PDF

Open Access 1 Models

TL;DR

This study investigates how reinforcement learning (RL) enhances medical vision-language models (VLMs), revealing that RL mainly sharpens outputs when models already have strong support, especially after supervised fine-tuning (SFT).

Contribution

The paper disentangles the effects of vision, SFT, and RL on medical VLMs, proposing a boundary-aware RL post-training method that improves performance across multiple benchmarks.

Findings

01

RL sharpens output distribution when support is high

02

SFT expands model support, enabling effective RL

03

Proposed boundary-aware recipe improves medical VQA performance

Abstract

Reinforcement learning (RL) is increasingly used to post-train medical Vision-Language Models (VLMs), yet it remains unclear whether RL improves medical visual reasoning or mainly sharpens behaviors already induced by supervised fine-tuning (SFT). We present a controlled study that disentangles these effects along three axes: vision, SFT, and RL. Using MedMNIST as a multi-modality testbed, we probe visual perception by benchmarking VLM vision towers against vision-only baselines, quantify reasoning support and sampling efficiency via Accuracy@1 versus Pass@K, and evaluate when RL closes the support gap and how gains transfer across modalities. We find that RL is most effective when the model already has non-trivial support (high Pass@K): it primarily sharpens the output distribution, improving Acc@1 and sampling efficiency, while SFT expands support and makes RL effective. Based on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
armenjeddi/MedBridgeRL-OctoMed-7B-PMC-VQA-RL
model· 38 dl· ♡ 2
38 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning