MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding

Yuhao Su; Anwesa Choudhuri; Zhongpai Gao; Benjamin Planche; Van Nguyen Nguyen; Meng Zheng; Yuhan Shen; Arun Innanje; Terrence Chen; Ehsan Elhamifar; Ziyan Wu

arXiv:2512.06581·cs.CV·April 9, 2026

MedGRPO: Multi-Task Reinforcement Learning for Heterogeneous Medical Video Understanding

Yuhao Su, Anwesa Choudhuri, Zhongpai Gao, Benjamin Planche, Van Nguyen Nguyen, Meng Zheng, Yuhan Shen, Arun Innanje, Terrence Chen, Ehsan Elhamifar, Ziyan Wu

PDF

1 Repo 2 Models 2 Datasets

TL;DR

This paper introduces MedGRPO, a novel reinforcement learning framework for medical video understanding, addressing dataset imbalance issues, and establishing a large-scale benchmark called MedVidBench.

Contribution

The paper presents MedGRPO, a new RL method with reward normalization and a medical LLM judge, improving multi-dataset training for medical video understanding.

Findings

01

Supervised fine-tuning with Qwen2.5-VL-7B outperforms GPT-4.1 and Gemini-2.5-Flash.

02

MedGRPO further enhances grounding and captioning performance.

03

MedVidBench is a large, expert-validated benchmark for medical video tasks.

Abstract

Large vision-language models struggle with medical video understanding, where spatial precision, temporal reasoning, and clinical semantics are critical. To address this, we first introduce \textbf{MedVidBench}, a large-scale benchmark of 531,850 video-instruction pairs across 8 medical sources spanning video, segment, and frame-level tasks, curated through a rigorous quality assurance pipeline with expert-guided prompting and dual-model validation. While supervised fine-tuning on MedVidBench yields noticeable gains, standard Reinforcement Learning (RL) fails due to imbalanced reward scales across datasets, which destabilizes optimization and leads to training collapse. To overcome this, we introduce \textbf{MedGRPO}, a novel RL framework for balanced multi-dataset training with two key innovations: (1) \emph{cross-dataset reward normalization} that maps each dataset's median…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://uii-america.github.io/MedGRPO
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.