Med3D-R1: Incentivizing Clinical Reasoning in 3D Medical Vision-Language Models for Abnormality Diagnosis
Haoran Lai, Zihang Jiang, Kun Zhang, Qingsong Yao, Rongsheng Wang, Zhiyang He, Xiaodong Tao, Wei Wei, Shaohua Kevin Zhou

TL;DR
Med3D-R1 introduces a reinforcement learning framework with a two-stage training process that enhances clinical reasoning and diagnostic accuracy in 3D medical vision-language models, addressing interpretability and overfitting issues.
Contribution
The paper presents Med3D-R1, a novel RL-based training framework with residual alignment and abnormality re-weighting strategies for improved 3D medical diagnosis.
Findings
Achieved state-of-the-art accuracy on CT-RATE and RAD-ChestCT benchmarks.
Enhanced clinical reasoning and interpretability in 3D medical vision-language models.
Outperformed prior methods in abnormality diagnosis tasks.
Abstract
Developing 3D vision-language models with robust clinical reasoning remains a challenge due to the inherent complexity of volumetric medical imaging, the tendency of models to overfit superficial report patterns, and the lack of interpretability-aware reward designs. In this paper, we propose Med3D-R1, a reinforcement learning framework with a two-stage training process: Supervised Fine-Tuning (SFT) and Reinforcement Learning (RL). During SFT stage, we introduce a residual alignment mechanism to bridge the gap between high-dimensional 3D features and textual embeddings, and an abnormality re-weighting strategy to emphasize clinically informative tokens and reduce structural bias in reports. In RL stage, we redesign the consistency reward to explicitly promote coherent, step-by-step diagnostic reasoning. We evaluate our method on medical multiple-choice visual question answering using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning
