Dual Encoding for Video Retrieval by Text

Jianfeng Dong; Xirong Li; Chaoxi Xu; Xun Yang; Gang Yang; Xun Wang,; Meng Wang

arXiv:2009.05381·cs.CV·February 19, 2021

Dual Encoding for Video Retrieval by Text

Jianfeng Dong, Xirong Li, Chaoxi Xu, Xun Yang, Gang Yang, Xun Wang,, Meng Wang

PDF

1 Repo

TL;DR

This paper introduces a dual deep encoding network for text-to-video retrieval that uses multi-level encoding and hybrid space learning, significantly improving cross-modal matching performance.

Contribution

It proposes a novel multi-level encoding architecture combined with hybrid space learning for more effective video retrieval by text queries.

Findings

01

Outperforms existing methods on four challenging video datasets.

02

Demonstrates the effectiveness of multi-level encoding and hybrid space learning.

03

Achieves high accuracy in cross-modal video retrieval tasks.

Abstract

This paper attacks the challenging problem of video retrieval by text. In such a retrieval paradigm, an end user searches for unlabeled videos by ad-hoc queries described exclusively in the form of a natural-language sentence, with no visual example provided. Given videos as sequences of frames and queries as sequences of words, an effective sequence-to-sequence cross-modal matching is crucial. To that end, the two modalities need to be first encoded into real-valued vectors and then projected into a common space. In this paper we achieve this by proposing a dual deep encoding network that encodes videos and queries into powerful dense representations of their own. Our novelty is two-fold. First, different from prior art that resorts to a specific single-level encoder, the proposed network performs multi-level encoding that represents the rich content of both modalities in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danieljf24/hybrid_space
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsInterpretability