MedVLThinker: Simple Baselines for Multimodal Medical Reasoning

Xiaoke Huang; Juncheng Wu; Hui Liu; Xianfeng Tang; Yuyin Zhou

arXiv:2508.02669·cs.CV·February 19, 2026

MedVLThinker: Simple Baselines for Multimodal Medical Reasoning

Xiaoke Huang, Juncheng Wu, Hui Liu, Xianfeng Tang, Yuyin Zhou

PDF

Open Access 10 Models

TL;DR

MedVLThinker introduces an open, reproducible framework for medical multimodal reasoning, demonstrating that text-only data training under reinforcement learning yields superior performance and setting new benchmarks in medical question answering.

Contribution

The paper provides a simple, strong baseline recipe for medical multimodal reasoning models, including data curation, training paradigms, and open-source resources, advancing reproducibility and performance.

Findings

01

RLVR outperforms SFT in experiments.

02

Training on text-only data improves performance more than multimodal data.

03

The 7B model achieves state-of-the-art results on medical VQA benchmarks.

Abstract

Large Reasoning Models (LRMs) have introduced a new paradigm in AI by enabling models to ``think before responding" via chain-of-thought reasoning. However, the absence of open and reproducible recipes for building reasoning-centric medical LMMs hinders community-wide research, analysis, and comparison. In this paper, we present MedVLThinker, a suite of simple yet strong baselines. Our fully open recipe consists of: (1) systematic data curation for both text-only and image-text medical data, filtered according to varying levels of reasoning difficulty, and (2) two training paradigms: Supervised Fine-Tuning (SFT) on distilled reasoning traces and Reinforcement Learning with Verifiable Rewards (RLVR) based on final answer correctness. Across extensive experiments on the Qwen2.5-VL model family (3B, 7B) and six medical QA benchmarks, we find that RLVR consistently and significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Topic Modeling