Improving Medical Reasoning with Curriculum-Aware Reinforcement Learning
Shaohao Rui, Kaitao Chen, Weijie Ma, Xiaosong Wang

TL;DR
This paper presents MedCCO, a curriculum-aware reinforcement learning framework that improves medical visual question answering by unifying close-ended and open-ended tasks, leading to better reasoning and generalization.
Contribution
MedCCO is the first multimodal RL framework for medical VQA that integrates close-ended and open-ended data through curriculum learning, enhancing reasoning and interpretability.
Findings
Achieves 11.4% accuracy gain on in-domain tasks.
Improves 5.7% performance on out-of-domain benchmarks.
Enhances reasoning and generalization in medical VQA models.
Abstract
Recent advances in reinforcement learning with verifiable, rule-based rewards have greatly enhanced the reasoning capabilities and out-of-distribution generalization of VLMs/LLMs, obviating the need for manually crafted reasoning chains. Despite these promising developments in the general domain, their translation to medical imaging remains limited. Current medical reinforcement fine-tuning (RFT) methods predominantly focus on close-ended VQA, thereby restricting the model's ability to engage in world knowledge retrieval and flexible task adaptation. More critically, these methods fall short of addressing the critical clinical demand for open-ended, reasoning-intensive decision-making. To bridge this gap, we introduce \textbf{MedCCO}, the first multimodal reinforcement learning framework tailored for medical VQA that unifies close-ended and open-ended data within a curriculum-driven RFT…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
MethodsFocus · Sparse Evolutionary Training
