Continual Retinal Vision-Language Pre-training upon Incremental Imaging Modalities
Yuang Yao, Ruiqi Wu, Yi Zhou, and Tao Zhou

TL;DR
RetCoP is a novel continual vision-language pre-training framework for fundus imaging that incrementally integrates multiple modalities, effectively mitigating catastrophic forgetting and enhancing model generalization in dynamic environments.
Contribution
This work introduces the first continual fundus vision-language pre-training framework, RetCoP, with a rehearsal strategy and off-diagonal information distillation to handle incremental modalities.
Findings
RetCoP outperforms existing methods in generalization.
RetCoP achieves the lowest forgetting rate.
RetCoP effectively integrates multiple fundus modalities.
Abstract
Traditional fundus image analysis models focus on single-modal tasks, ignoring fundus modality complementarity, which limits their versatility. Recently, retinal foundation models have emerged, but most still remain modality-specific. Integrating multiple fundus imaging modalities into a single foundation model is valuable. However, in dynamic environments, data from different modalities often arrive incrementally, necessitating continual pre-training. To address this, we propose RetCoP, the first continual vision-language pre-training framework in the fundus domain, which incrementally integrates image and text features from different imaging modalities into a single unified foundation model. To mitigate catastrophic forgetting in continual pre-training, we introduce a rehearsal strategy utilizing representative image-text pairs and an off-diagonal information distillation approach.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis
