Adaptive Reinforcement for Open-ended Medical Reasoning via Semantic-Guided Reward Collapse Mitigation

Yizhou Liu; Dingkang Yang; Zizhi Chen; Minghao Han; Xukun Zhang; Keliang Liu; Jingwei Wei; Lihua Zhang

arXiv:2508.12957·cs.CV·April 3, 2026

Adaptive Reinforcement for Open-ended Medical Reasoning via Semantic-Guided Reward Collapse Mitigation

Yizhou Liu, Dingkang Yang, Zizhi Chen, Minghao Han, Xukun Zhang, Keliang Liu, Jingwei Wei, Lihua Zhang

PDF

TL;DR

This paper introduces ARMed, an RL framework that enhances open-ended medical VQA by using adaptive semantic rewards and domain expertise to improve reasoning accuracy and generalization.

Contribution

ARMed uniquely combines supervised fine-tuning with adaptive semantic rewards to address reward collapse in open-ended medical VQA, improving model robustness.

Findings

01

ARMed outperforms existing methods on six medical VQA benchmarks.

02

Adaptive semantic rewards improve reasoning consistency and factual accuracy.

03

Reward discriminability is crucial for effective medical reinforcement learning.

Abstract

Reinforcement learning (RL) with rule-based reward functions has recently shown great promise in enhancing the reasoning depth and generalization ability of vision-language models (VLMs), while maintaining computational efficiency. In spite of these advances, its adoption in medical imaging remains limited. Current reinforcement fine-tuning (RFT) efforts in this field mainly focus on closed-ended visual question answering (VQA), restricting their applicability to realistic clinical reasoning. However, open-ended medical VQA better mirrors clinical diagnostic workflows but remains underexplored. Although several studies have attempted to bridge the two formats through semantically guided RL, model-driven semantic rewards often suffer from reward collapse, where responses with distinct semantics yield nearly identical scores. To overcome this limitation, we introduce Adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.