Bridging the Gap in Ophthalmic AI: MM-Retinal-Reason Dataset and OphthaReason Model toward Dynamic Multimodal Reasoning
Ruiqi Wu, Yuang Yao, Tengfei Ma, Chenran Zhang, Na Su, Tao Zhou, Geng Chen, Wen Fan, Yi Zhou

TL;DR
This paper introduces MM-Retinal-Reason, a comprehensive ophthalmic multimodal dataset, and OphthaReason, a reasoning model with dynamic uncertainty estimation, to improve complex clinical reasoning in ophthalmology AI.
Contribution
It presents the first ophthalmic multimodal dataset covering basic and complex reasoning, and a novel dynamic reasoning model with uncertainty estimation for improved clinical inference.
Findings
Achieved state-of-the-art performance on ophthalmic reasoning tasks.
Outperformed existing models by at least 15-25%.
Demonstrated effectiveness of uncertainty-aware dynamic reasoning.
Abstract
Multimodal large language models (MLLMs) have recently demonstrated remarkable reasoning abilities with reinforcement learning paradigm. Although several multimodal reasoning models have been explored in the medical domain, most of them focus exclusively on basic reasoning, which refers to shallow inference based on visual feature matching. However, real-world clinical diagnosis extends beyond basic reasoning, demanding reasoning processes that integrate heterogeneous clinical information (such as chief complaints and medical history) with multimodal medical imaging data. To bridge this gap, we introduce MM-Retinal-Reason, the first ophthalmic multimodal dataset with the full spectrum of perception and reasoning. It encompasses both basic reasoning tasks and complex reasoning tasks, aiming to enhance visual-centric fundamental reasoning capabilities and emulate realistic clinical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
