Expectation-Maximization Contrastive Learning for Compact   Video-and-Language Representations

Peng Jin; Jinfa Huang; Fenglin Liu; Xian Wu; Shen Ge; Guoli Song,; David A. Clifton; Jie Chen

arXiv:2211.11427·cs.CV·November 22, 2022·35 cites

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations

Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song,, David A. Clifton, Jie Chen

PDF

Open Access 4 Repos 1 Video

TL;DR

This paper introduces EMCL, a novel contrastive learning method using expectation-maximization to learn compact, more discriminative video-and-language representations, significantly improving retrieval performance.

Contribution

The paper proposes EMCL, a new approach that finds a compact basis for the latent space, reducing its rank and enhancing semantic representation power in video-and-language tasks.

Findings

01

Outperforms previous state-of-the-art methods on three benchmark datasets.

02

Enhances representation discriminability and retrieval accuracy.

03

Can be integrated into existing models without additional training.

Abstract

Most video-and-language representation learning approaches employ contrastive learning, e.g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs. However, such learned shared latent spaces are not often optimal, and the modality gap between visual and textual representation can not be fully eliminated. In this paper, we propose Expectation-Maximization Contrastive Learning (EMCL) to learn compact video-and-language representations. Specifically, we use the Expectation-Maximization algorithm to find a compact set of bases for the latent space, where the features could be concisely represented as the linear combinations of these bases. Such feature decomposition of video-and-language representations reduces the rank of the latent space, resulting in increased representing power for the semantics. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Cancer-related molecular mechanisms research · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning · Contrastive Language-Image Pre-training