MedEyes: Learning Dynamic Visual Focus for Medical Progressive Diagnosis
Chunzheng Zhu, Yangfang Lin, Shen Chen, Yijun Wang, Jianxin Lin

TL;DR
MedEyes introduces a reinforcement learning framework that mimics clinician visual reasoning by dynamically attending to medical images, integrating expert guidance and dual-mode exploration to improve medical visual question answering accuracy.
Contribution
It presents MedEyes, a novel RL-based model with a gaze-guided reasoning navigator and confidence sampler, enhancing clinical reasoning and interpretability in medical AI.
Findings
Achieved +8.5 percentage points improvement on medical VQA benchmarks.
Effectively models clinician-like visual search and reasoning processes.
Demonstrates robustness and trustworthiness in medical diagnosis tasks.
Abstract
Accurate medical diagnosis often involves progressive visual focusing and iterative reasoning, characteristics commonly observed in clinical workflows. While recent vision-language models demonstrate promising chain-of-thought (CoT) reasoning capabilities via reinforcement learning with verifiable rewards (RLVR), their purely on-policy learning paradigm tends to reinforce superficially coherent but clinically inaccurate reasoning paths. We propose MedEyes, a novel reinforcement learning framework that dynamically models clinician-style diagnostic reasoning by progressively attending to and interpreting relevant medical image regions. By incorporating off-policy expert guidance, MedEyes converts expert visual search trajectories into structured external behavioral signals, guiding the model toward clinically aligned visual reasoning. We design the Gaze-guided Reasoning Navigator (GRN) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMultimodal Machine Learning Applications · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)
