MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

Ruiqi Wu; Chenran Zhang; Jianle Zhang; Yi Zhou; Tao Zhou; Huazhu Fu

arXiv:2405.11793·cs.CV·August 27, 2025·1 cites

MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise

Ruiqi Wu, Chenran Zhang, Jianle Zhang, Yi Zhou, Tao Zhou, Huazhu Fu

PDF

Open Access 1 Repo

TL;DR

This paper introduces MM-Retinal, a multi-modal fundus image-text dataset, and KeepFIT, a knowledge-enhanced pretraining model that leverages expert knowledge for improved transferability and generalization in fundus image analysis.

Contribution

The paper presents a novel multi-modal dataset and a knowledge-infused pretraining model that significantly improve performance and generalization in fundus image analysis tasks.

Findings

01

Achieves state-of-the-art results on six unseen tasks

02

Demonstrates strong zero-shot and few-shot generalization

03

Introduces effective image-text knowledge infusion strategies

Abstract

Current fundus image analysis models are predominantly built for specific tasks relying on individual datasets. The learning process is usually based on data-driven paradigm without prior knowledge, resulting in poor transferability and generalizability. To address this issue, we propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fundus diagram books. Moreover, enabled by MM-Retinal, we present a novel Knowledge-enhanced foundational pretraining model which incorporates Fundus Image-Text expertise, called KeepFIT. It is designed with image similarity-guided text revision and mixed training strategy to infuse expert knowledge. Our proposed fundus foundation model achieves state-of-the-art performance across six unseen downstream tasks and holds excellent generalization ability in zero-shot and few-shot scenarios.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lxirich/mm-retinal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning