Med3D-R1: Incentivizing Clinical Reasoning in 3D Medical Vision-Language Models for Abnormality Diagnosis

Haoran Lai; Zihang Jiang; Kun Zhang; Qingsong Yao; Rongsheng Wang; Zhiyang He; Xiaodong Tao; Wei Wei; Shaohua Kevin Zhou

arXiv:2602.01200·cs.CV·February 3, 2026

Med3D-R1: Incentivizing Clinical Reasoning in 3D Medical Vision-Language Models for Abnormality Diagnosis

Haoran Lai, Zihang Jiang, Kun Zhang, Qingsong Yao, Rongsheng Wang, Zhiyang He, Xiaodong Tao, Wei Wei, Shaohua Kevin Zhou

PDF

Open Access

TL;DR

Med3D-R1 introduces a reinforcement learning framework with a two-stage training process that enhances clinical reasoning and diagnostic accuracy in 3D medical vision-language models, addressing interpretability and overfitting issues.

Contribution

The paper presents Med3D-R1, a novel RL-based training framework with residual alignment and abnormality re-weighting strategies for improved 3D medical diagnosis.

Findings

01

Achieved state-of-the-art accuracy on CT-RATE and RAD-ChestCT benchmarks.

02

Enhanced clinical reasoning and interpretability in 3D medical vision-language models.

03

Outperformed prior methods in abnormality diagnosis tasks.

Abstract

Developing 3D vision-language models with robust clinical reasoning remains a challenge due to the inherent complexity of volumetric medical imaging, the tendency of models to overfit superficial report patterns, and the lack of interpretability-aware reward designs. In this paper, we propose Med3D-R1, a reinforcement learning framework with a two-stage training process: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). During SFT stage, we introduce a residual alignment mechanism to bridge the gap between high-dimensional 3D features and textual embeddings, and an abnormality re-weighting strategy to emphasize clinically informative tokens and reduce structural bias in reports. In RL stage, we redesign the consistency reward to explicitly promote coherent, step-by-step diagnostic reasoning. We evaluate our method on medical multiple-choice visual question answering using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning