SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Xirong Li; Fangming Zhou; Chaoxi Xu; Jiaqi Ji; Gang Yang

arXiv:2011.12091·cs.CV·November 25, 2020

SEA: Sentence Encoder Assembly for Video Retrieval by Textual Queries

Xirong Li, Fangming Zhou, Chaoxi Xu, Jiaqi Ji, Gang Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces SEA, a novel method for video retrieval using textual queries that leverages multiple sentence encoders across diverse common spaces, improving accuracy and robustness in cross-modal matching.

Contribution

SEA is the first to support multi-space matching with multi-loss learning, effectively exploiting diverse sentence encoders for better video retrieval performance.

Findings

01

SEA outperforms state-of-the-art methods on four benchmarks.

02

Multi-space multi-loss learning enhances matching accuracy.

03

SEA is simple to implement and adaptable to new encoders.

Abstract

Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a core theme in multimedia data management and retrieval. The success of AVS counts on cross-modal representation learning that encodes both query sentences and videos into common spaces for semantic similarity computation. Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders. The novelty of the proposed method, which we term Sentence Encoder Assembly (SEA), is two-fold. First, different from prior art that use only a single common space, SEA supports text-video matching in multiple encoder-specific common spaces. Such a property prevents the matching from being dominated by a specific encoder that produces an encoding vector much…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

li-xirong/sea
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning