MedReason-R1: Learning to Reason for CT Diagnosis with Reinforcement Learning and Local Zoom
Yifan Li, Fenghe Tang, Yingtai Li, Shaohua Kevin Zhou

TL;DR
MedReason-R1 is a novel medical vision-language model that uses reinforcement learning and a zoom-in strategy to improve CT diagnosis accuracy, addressing dataset limitations and diagnostic reasoning.
Contribution
It introduces a new dataset, a zoom-in reasoning strategy, and a reinforcement learning framework for improved CT diagnosis in medical VLMs.
Findings
Achieves state-of-the-art CT diagnosis performance
Effectively incorporates disease region localization
Demonstrates strong generalization capabilities
Abstract
General-purpose large Vision-Language Models (VLMs) demonstrate strong capabilities in generating detailed descriptions for natural images. However, their performance in the medical domain remains suboptimal, even for relatively straightforward tasks, primarily due to the lack of large-scale, high-quality, specialized medical imaging datasets and the neglect of the diagnostic process that progresses from coarse to fine-grained. To address the first issue, we construct the CT-RATE-VQA dataset, which has 84K QA pairs. For the second issue, we propose MedReason-R1, a medical VLM with explicit reasoning process for disease diagnosis. MedReason-R1 incorporates a novel strategy that embeds zoom-in disease region-of-interest areas into the image, highlighting the crucial role of both global localization and disease-specific details in enhancing the model's diagnostic performance. Furthermore,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · COVID-19 diagnosis using AI · Advanced Neural Network Applications
