MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding
Yuhao Su, Anwesa Choudhuri, Zhongpai Gao, Benjamin Planche, Van Nguyen Nguyen, Meng Zheng, Yuhan Shen, Arun Innanje, Terrence Chen, Ehsan Elhamifar, Ziyan Wu

TL;DR
This paper introduces MedGRPO, a novel reinforcement learning framework for medical video understanding, addressing dataset imbalance issues, and establishing a large-scale benchmark called MedVidBench.
Contribution
The paper presents MedGRPO, a new RL method with reward normalization and a medical LLM judge, improving multi-dataset training for medical video understanding.
Findings
Supervised fine-tuning with Qwen2.5-VL-7B outperforms GPT-4.1 and Gemini-2.5-Flash.
MedGRPO further enhances grounding and captioning performance.
MedVidBench is a large, expert-validated benchmark for medical video tasks.
Abstract
Large vision-language models struggle with medical video understanding, where spatial precision, temporal reasoning, and clinical semantics are critical. To address this, we first introduce \textbf{MedVidBench}, a large-scale benchmark of 531,850 video-instruction pairs across 8 medical sources spanning video, segment, and frame-level tasks, curated through a rigorous quality assurance pipeline with expert-guided prompting and dual-model validation. While supervised fine-tuning on MedVidBench yields noticeable gains, standard Reinforcement Learning (RL) fails due to imbalanced reward scales across datasets, which destabilizes optimization and leads to training collapse. To overcome this, we introduce \textbf{MedGRPO}, a novel RL framework for balanced multi-dataset training with two key innovations: (1) \emph{cross-dataset reward normalization} that maps each dataset's median…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
