MM-Retinal V2: Transfer an Elite Knowledge Spark into Fundus Vision-Language Pretraining
Ruiqi Wu, Na Su, Chenran Zhang, Tengfei Ma, Tao Zhou, Zhiting Cui, Nianfeng Tang, Tianyu Mao, Yi Zhou, Wen Fan, Tianxing Wu, Shenqi Jing, Huazhu Fu

TL;DR
This paper introduces MM-Retinal V2, a high-quality fundus image-text dataset, and KeepFIT V2, a novel vision-language pretraining model that effectively transfers elite knowledge into public datasets, enhancing fundus image analysis.
Contribution
The work presents a new dataset and a pretraining method that integrates knowledge transfer techniques, improving fundus vision-language models without relying on large private datasets.
Findings
Achieves competitive performance with state-of-the-art models
Demonstrates strong generalization in zero-shot and few-shot tasks
Provides publicly available dataset and model for research
Abstract
Vision-language pretraining (VLP) has been investigated to generalize across diverse downstream tasks for fundus image analysis. Although recent methods showcase promising achievements, they significantly rely on large-scale private image-text data but pay less attention to the pretraining manner, which limits their further advancements. In this work, we introduce MM-Retinal V2, a high-quality image-text paired dataset comprising CFP, FFA, and OCT image modalities. Then, we propose a novel fundus vision-language pretraining model, namely KeepFIT V2, which is pretrained by integrating knowledge from the elite data spark into categorical public datasets. Specifically, a preliminary textual pretraining is adopted to equip the text encoder with primarily ophthalmic textual knowledge. Moreover, a hybrid image-text knowledge injection module is designed for knowledge transfer, which is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText Readability and Simplification
MethodsSoftmax · Attention Is All You Need · Contrastive Learning
