Continual learning in cross-modal retrieval
Kai Wang, Luis Herranz, Joost van de Weijer

TL;DR
This paper introduces a continual learning framework for cross-modal retrieval, addressing how to prevent forgetting in shared embedding spaces when learning new tasks involving language and visual data.
Contribution
It proposes a novel continual cross-modal retrieval setting and a framework that decouples training, indexing, and querying to mitigate forgetting.
Findings
Avoiding reindexing improves retrieval performance.
Significant gains over fine-tuning baseline.
Indexing stage is crucial for preventing forgetting.
Abstract
Multimodal representations and continual learning are two areas closely related to human intelligence. The former considers the learning of shared representation spaces where information from different modalities can be compared and integrated (we focus on cross-modal retrieval between language and visual representations). The latter studies how to prevent forgetting a previously learned task when learning a new one. While humans excel in these two aspects, deep neural networks are still quite limited. In this paper, we propose a combination of both problems into a continual cross-modal retrieval setting, where we study how the catastrophic interference caused by new tasks impacts the embedding spaces and their cross-modal alignment required for effective retrieval. We propose a general framework that decouples the training, indexing and querying stages. We also identify and study…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
