Continual Retinal Vision-Language Pre-training upon Incremental Imaging Modalities

Yuang Yao; Ruiqi Wu; Yi Zhou; and Tao Zhou

arXiv:2506.19320·cs.CV·June 25, 2025

Continual Retinal Vision-Language Pre-training upon Incremental Imaging Modalities

Yuang Yao, Ruiqi Wu, Yi Zhou, and Tao Zhou

PDF

Open Access

TL;DR

RetCoP is a novel continual vision-language pre-training framework for fundus imaging that incrementally integrates multiple modalities, effectively mitigating catastrophic forgetting and enhancing model generalization in dynamic environments.

Contribution

This work introduces the first continual fundus vision-language pre-training framework, RetCoP, with a rehearsal strategy and off-diagonal information distillation to handle incremental modalities.

Findings

01

RetCoP outperforms existing methods in generalization.

02

RetCoP achieves the lowest forgetting rate.

03

RetCoP effectively integrates multiple fundus modalities.

Abstract

Traditional fundus image analysis models focus on single-modal tasks, ignoring fundus modality complementarity, which limits their versatility. Recently, retinal foundation models have emerged, but most still remain modality-specific. Integrating multiple fundus imaging modalities into a single foundation model is valuable. However, in dynamic environments, data from different modalities often arrive incrementally, necessitating continual pre-training. To address this, we propose RetCoP, the first continual vision-language pre-training framework in the fundus domain, which incrementally integrates image and text features from different imaging modalities into a single unified foundation model. To mitigate catastrophic forgetting in continual pre-training, we introduce a rehearsal strategy utilizing representative image-text pairs and an off-diagonal information distillation approach.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis