CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for   Multimodal Machine Translation

Devaansh Gupta; Siddhant Kharbanda; Jiawei Zhou; Wanhua Li; Hanspeter; Pfister; Donglai Wei

arXiv:2308.15226·cs.CV·August 30, 2023

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation

Devaansh Gupta, Siddhant Kharbanda, Jiawei Zhou, Wanhua Li, Hanspeter, Pfister, Donglai Wei

PDF

Open Access 1 Repo 1 Video

TL;DR

CLIPTrans leverages pre-trained multilingual and multimodal models to improve multimodal machine translation, achieving state-of-the-art results without complex new modules by aligning embedding spaces through a lightweight mapping.

Contribution

It introduces a simple adaptation method that aligns pre-trained models for effective multimodal translation, especially in low-resource language scenarios.

Findings

01

Achieves an average of +2.67 BLEU over benchmarks.

02

Effectively aligns multilingual and multimodal embeddings.

03

Demonstrates strong generalization in low-resource settings.

Abstract

There has been a growing interest in developing multimodal machine translation (MMT) systems that enhance neural machine translation (NMT) with visual knowledge. This problem setup involves using images as auxiliary information during training, and more recently, eliminating their use during inference. Towards this end, previous works face a challenge in training powerful MMT models from scratch due to the scarcity of annotated multilingual vision-language data, especially for low-resource languages. Simultaneously, there has been an influx of multilingual pre-trained models for NMT and multimodal pre-trained models for vision-language tasks, primarily in English, which have shown exceptional generalisation ability. However, these are not directly applicable to MMT since they do not provide aligned multimodal multilingual features for generative tasks. To alleviate this issue, instead…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

devaansh100/cliptrans
pytorchOfficial

Videos

CLIPTrans: Transferring Visual Knowledge with Pre-trained Models for Multimodal Machine Translation· youtube

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling

MethodsALIGN · mBART