Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal   Pre-training

Yan Zeng; Wangchunshu Zhou; Ao Luo; Ziming Cheng; Xinsong Zhang

arXiv:2206.00621·cs.CL·June 13, 2023·5 cites

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

Yan Zeng, Wangchunshu Zhou, Ao Luo, Ziming Cheng, Xinsong Zhang

PDF

Open Access 1 Repo

TL;DR

This paper proposes a unified pre-training framework called Cross-View Language Modeling that aligns multi-lingual and multi-modal data into a shared semantic space, significantly improving cross-lingual and cross-modal tasks.

Contribution

It introduces a novel cross-view language modeling framework that unifies cross-lingual and cross-modal pre-training with shared architectures and objectives.

Findings

01

CCLM outperforms previous state-of-the-art by over 10% on benchmarks.

02

First multi-lingual multi-modal model surpassing English vision-language models in zero-shot transfer.

03

Achieves significant improvements on IGLUE and image-text retrieval datasets.

Abstract

In this paper, we introduce Cross-View Language Modeling, a simple and effective pre-training framework that unifies cross-lingual and cross-modal pre-training with shared architectures and objectives. Our approach is motivated by a key observation that cross-lingual and cross-modal pre-training share the same goal of aligning two different views of the same object into a common semantic space. To this end, the cross-view language modeling framework considers both multi-modal data (i.e., image-caption pairs) and multi-lingual data (i.e., parallel sentence pairs) as two different views of the same object, and trains the model to align the two views by maximizing the mutual information between them with conditional masked language modeling and contrastive learning. We pre-train CCLM, a Cross-lingual Cross-modal Language Model, with the cross-view language modeling framework. Empirical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zengyan-97/cclm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Topic Modeling

MethodsALIGN