Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages

Yasmine Karoui; R\'emi Lebret; Negar Foroutan; Karl Aberer

arXiv:2306.16774·cs.CL·June 30, 2023

Stop Pre-Training: Adapt Visual-Language Models to Unseen Languages

Yasmine Karoui, R\'emi Lebret, Negar Foroutan, Karl Aberer

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple method to adapt vision-language models to unseen languages using multilingual pre-trained language models and machine translation, improving performance without needing target language data.

Contribution

It proposes a cross-lingual token embedding alignment approach that enables VLP models to work effectively in unseen languages without large parallel corpora.

Findings

01

Outperforms state-of-the-art multilingual vision-language models

02

Effective across image-text retrieval, visual entailment, and visual reasoning tasks

03

Does not require target language data or large parallel corpora

Abstract

Vision-Language Pre-training (VLP) has advanced the performance of many vision-language tasks, such as image-text retrieval, visual entailment, and visual reasoning. The pre-training mostly utilizes lexical databases and image queries in English. Previous work has demonstrated that the pre-training in English does not transfer well to other languages in a zero-shot setting. However, multilingual pre-trained language models (MPLM) have excelled at a variety of single-modal language tasks. In this paper, we propose a simple yet efficient approach to adapt VLP to unseen languages using MPLM. We utilize a cross-lingual contextualized token embeddings alignment approach to train text encoders for non-English languages. Our approach does not require image input and primarily uses machine translation, eliminating the need for target language data. Our evaluation across three distinct tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yasminekaroui/clicotea
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling