Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models

Wenhui Zhu; Xuanzhao Dong; Xin Li; Peijie Qiu; Xiwen Chen; Abolfazl Razi; Aris Sotiras; Yi Su; and Yalin Wang

arXiv:2505.13973·cs.CL·May 21, 2025

Toward Effective Reinforcement Learning Fine-Tuning for Medical VQA in Vision-Language Models

Wenhui Zhu, Xuanzhao Dong, Xin Li, Peijie Qiu, Xiwen Chen, Abolfazl Razi, Aris Sotiras, Yi Su, and Yalin Wang

PDF

Open Access

TL;DR

This paper explores how reinforcement learning fine-tuning improves medical visual question answering in vision-language models, focusing on domain-specific challenges and demonstrating superior performance over traditional methods.

Contribution

It investigates four key factors affecting RL-based fine-tuning in medical VQA and shows that GRPO-based RL consistently outperforms supervised fine-tuning.

Findings

01

GRPO-based RL improves accuracy in medical VQA

02

Semantic alignment enhances model responses

03

Length-based rewards aid long-chain reasoning

Abstract

Recently, reinforcement learning (RL)-based tuning has shifted the trajectory of Multimodal Large Language Models (MLLMs), particularly following the introduction of Group Relative Policy Optimization (GRPO). However, directly applying it to medical tasks remains challenging for achieving clinically grounded model behavior. Motivated by the need to align model response with clinical expectations, we investigate four critical dimensions that affect the effectiveness of RL-based tuning in medical visual question answering (VQA): base model initialization strategy, the role of medical semantic alignment, the impact of length-based rewards on long-chain reasoning, and the influence of bias. We conduct extensive experiments to analyze these factors for medical MLLMs, providing new insights into how models are domain-specifically fine-tuned. Additionally, our results also demonstrate that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEEG and Brain-Computer Interfaces · Brain Tumor Detection and Classification

MethodsBalanced Selection · ALIGN