MedVLThinker: Simple Baselines for Multimodal Medical Reasoning
Xiaoke Huang, Juncheng Wu, Hui Liu, Xianfeng Tang, Yuyin Zhou

TL;DR
MedVLThinker introduces an open, reproducible framework for medical multimodal reasoning, demonstrating that text-only data training under reinforcement learning yields superior performance and setting new benchmarks in medical question answering.
Contribution
The paper provides a simple, strong baseline recipe for medical multimodal reasoning models, including data curation, training paradigms, and open-source resources, advancing reproducibility and performance.
Findings
RLVR outperforms SFT in experiments.
Training on text-only data improves performance more than multimodal data.
The 7B model achieves state-of-the-art results on medical VQA benchmarks.
Abstract
Large Reasoning Models (LRMs) have introduced a new paradigm in AI by enabling models to ``think before responding" via chain-of-thought reasoning. However, the absence of open and reproducible recipes for building reasoning-centric medical LMMs hinders community-wide research, analysis, and comparison. In this paper, we present MedVLThinker, a suite of simple yet strong baselines. Our fully open recipe consists of: (1) systematic data curation for both text-only and image-text medical data, filtered according to varying levels of reasoning difficulty, and (2) two training paradigms: Supervised Fine-Tuning (SFT) on distilled reasoning traces and Reinforcement Learning with Verifiable Rewards (RLVR) based on final answer correctness. Across extensive experiments on the Qwen2.5-VL model family (3B, 7B) and six medical QA benchmarks, we find that RLVR consistently and significantly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗MedVLSynther/MedVLSynther-3B-RL_1Kmodel· 2 dl2 dl
- 🤗MedVLSynther/MedVLSynther-3B-RL_2Kmodel
- 🤗MedVLSynther/MedVLSynther-3B-RL_5Kmodel· 1 dl1 dl
- 🤗MedVLSynther/MedVLSynther-3B-RL_10Kmodel· 1 dl1 dl
- 🤗MedVLSynther/MedVLSynther-3B-RL_13Kmodel
- 🤗MedVLSynther/MedVLSynther-3B-RL_5K_qwen-glmmodel· 3 dl3 dl
- 🤗MedVLSynther/MedVLSynther-3B-RL_5K_internvl-glmmodel· 3 dl3 dl
- 🤗MedVLSynther/MedVLSynther-3B-RL_5K_glm-glmmodel· 3 dl3 dl
- 🤗MedVLSynther/MedVLSynther-3B-RL_5K_no-verifymodel
- 🤗MedVLSynther/MedVLSynther-3B-RL_5K_PMC-stylemodel· 8 dl8 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Topic Modeling
