Domain Adaptation in Multi-View Embedding for Cross-Modal Video   Retrieval

Jonathan Munro; Michael Wray; Diane Larlus; Gabriela Csurka; Dima; Damen

arXiv:2110.12812·cs.CV·October 26, 2021

Domain Adaptation in Multi-View Embedding for Cross-Modal Video Retrieval

Jonathan Munro, Michael Wray, Diane Larlus, Gabriela Csurka, Dima, Damen

PDF

Open Access

TL;DR

This paper introduces an unsupervised domain adaptation method for cross-modal video retrieval, aligning video embeddings across different domains to improve retrieval accuracy without requiring annotations in the target domain.

Contribution

It proposes a novel iterative domain alignment approach using pseudo-labeling and cross-domain ranking, specifically addressing the domain gap in uncaptioned video retrieval tasks.

Findings

01

Outperforms source-only and other alignment methods

02

Effective in fine-grained action video retrieval

03

Establishes a new benchmark for unsupervised domain adaptation

Abstract

Given a gallery of uncaptioned video sequences, this paper considers the task of retrieving videos based on their relevance to an unseen text query. To compensate for the lack of annotations, we rely instead on a related video gallery composed of video-caption pairs, termed the source gallery, albeit with a domain gap between its videos and those in the target gallery. We thus introduce the problem of Unsupervised Domain Adaptation for Cross-modal Video Retrieval, along with a new benchmark on fine-grained actions. We propose a novel iterative domain alignment method by means of pseudo-labelling target videos and cross-domain (i.e. source-target) ranking. Our approach adapts the embedding space to the target gallery, consistently outperforming source-only as well as marginal and conditional alignment methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques