Embracing Language Inclusivity and Diversity in CLIP through Continual   Language Learning

Bang Yang; Yong Dai; Xuxin Cheng; Yaowei Li; Asif Raza; Yuexian Zou

arXiv:2401.17186·cs.CV·January 31, 2024·1 cites

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou

PDF

Open Access 1 Repo

TL;DR

This paper introduces CLL-CLIP, a continual learning approach for multilingual vision-language models that incrementally expands language capabilities without catastrophic forgetting, demonstrated on a new 36-language benchmark.

Contribution

It proposes a novel continual language learning method for VL models, with an expandable token embedding layer and regularization techniques to prevent forgetting.

Findings

01

CLL-CLIP improves multilingual image-text retrieval performance.

02

The approach boosts state-of-the-art methods by up to 6.7% in Recall@1.

03

Constructed a comprehensive 36-language benchmark for evaluation.

Abstract

While vision-language pre-trained models (VL-PTMs) have advanced multimodal research in recent years, their mastery in a few languages like English restricts their applicability in broader communities. To this end, there is an increasing interest in developing multilingual VL models via a joint-learning setup, which, however, could be unrealistic due to expensive costs and data availability. In this work, we propose to extend VL-PTMs' language capacity by continual language learning (CLL), where a model needs to update its linguistic knowledge incrementally without suffering from catastrophic forgetting (CF). We begin our study by introducing a model dubbed CLL-CLIP, which builds upon CLIP, a prevailing VL-PTM that has acquired image-English text alignment. Specifically, CLL-CLIP contains an expandable token embedding layer to handle linguistic differences. It solely trains token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yangbang18/clfm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSecond Language Learning and Teaching · EFL/ESL Teaching and Learning

MethodsContrastive Language-Image Pre-training