Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data

Yuchuan Deng; Qijie Wei; Kaiheng Qian; Jiazhen Liu; Zijie Xin; Bangxiang Lan; Jingyu Liu; Jianfeng Dong; Xirong Li

arXiv:2604.08322·cs.CV·April 10, 2026

Fundus-R1: Training a Fundus-Reading MLLM with Knowledge-Aware Reasoning on Public Data

Yuchuan Deng, Qijie Wei, Kaiheng Qian, Jiazhen Liu, Zijie Xin, Bangxiang Lan, Jingyu Liu, Jianfeng Dong, Xirong Li

PDF

2 Models

TL;DR

Fundus-R1 is a knowledge-aware, reasoning-enhanced multimodal large language model trained solely on public datasets for improved fundus image understanding and diagnosis.

Contribution

It introduces a RAG-based reasoning trace generation method and enhances RLVR with process rewards, enabling effective training on publicly available data.

Findings

01

Fundus-R1 outperforms baseline models on three fundus-reading benchmarks.

02

The RAG-based reasoning traces improve the model's interpretability and accuracy.

03

Self-consistency rewards enhance the reasoning quality during training.

Abstract

Fundus imaging such as CFP, OCT and UWF is crucial for the early detection of retinal anomalies and diseases. Fundus image understanding, due to its knowledge-intensive nature, poses a challenging vision-language task. An emerging approach to addressing the task is to post-train a generic multimodal large language model (MLLM), either by supervised finetuning (SFT) or by reinforcement learning with verifiable rewards (RLVR), on a considerable amount of in-house samples paired with high-quality clinical reports. However, these valuable samples are not publicly accessible, which not only hinders reproducibility but also practically limits research to few players. To overcome the barrier, we make a novel attempt to train a reasoning-enhanced fundus-reading MLLM, which we term Fundus-R1, using exclusively public datasets, wherein over 94\% of the data are annotated with only image-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.