MM-Retinal: Knowledge-Enhanced Foundational Pretraining with Fundus Image-Text Expertise
Ruiqi Wu, Chenran Zhang, Jianle Zhang, Yi Zhou, Tao Zhou, Huazhu Fu

TL;DR
This paper introduces MM-Retinal, a multi-modal fundus image-text dataset, and KeepFIT, a knowledge-enhanced pretraining model that leverages expert knowledge for improved transferability and generalization in fundus image analysis.
Contribution
The paper presents a novel multi-modal dataset and a knowledge-infused pretraining model that significantly improve performance and generalization in fundus image analysis tasks.
Findings
Achieves state-of-the-art results on six unseen tasks
Demonstrates strong zero-shot and few-shot generalization
Introduces effective image-text knowledge infusion strategies
Abstract
Current fundus image analysis models are predominantly built for specific tasks relying on individual datasets. The learning process is usually based on data-driven paradigm without prior knowledge, resulting in poor transferability and generalizability. To address this issue, we propose MM-Retinal, a multi-modal dataset that encompasses high-quality image-text pairs collected from professional fundus diagram books. Moreover, enabled by MM-Retinal, we present a novel Knowledge-enhanced foundational pretraining model which incorporates Fundus Image-Text expertise, called KeepFIT. It is designed with image similarity-guided text revision and mixed training strategy to infuse expert knowledge. Our proposed fundus foundation model achieves state-of-the-art performance across six unseen downstream tasks and holds excellent generalization ability in zero-shot and few-shot scenarios.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning
