Reading-strategy Inspired Visual Representation Learning for   Text-to-Video Retrieval

Jianfeng Dong; Yabing Wang; Xianke Chen; Xiaoye Qu; Xirong Li; Yuan; He; Xun Wang

arXiv:2201.09168·cs.CV·March 4, 2022

Reading-strategy Inspired Visual Representation Learning for Text-to-Video Retrieval

Jianfeng Dong, Yabing Wang, Xianke Chen, Xiaoye Qu, Xirong Li, Yuan, He, Xun Wang

PDF

1 Repo

TL;DR

This paper introduces RIVRL, a novel video representation learning method inspired by human reading strategies, which improves text-to-video retrieval by capturing both overview and detailed video features, achieving state-of-the-art results.

Contribution

Proposes a reading-strategy inspired dual-branch framework for video representation learning that enhances cross-modal retrieval performance.

Findings

01

Achieves new state-of-the-art on TGIF and VATEX datasets.

02

Performs comparably or better than models trained on large-scale datasets.

03

Effectively captures both overview and detailed video information.

Abstract

This paper aims for the task of text-to-video retrieval, where given a query in the form of a natural-language sentence, it is asked to retrieve videos which are semantically relevant to the given query, from a great number of unlabeled videos. The success of this task depends on cross-modal representation learning that projects both videos and sentences into common spaces for semantic similarity computation. In this work, we concentrate on video representation learning, an essential component for text-to-video retrieval. Inspired by the reading strategy of humans, we propose a Reading-strategy Inspired Visual Representation Learning (RIVRL) to represent videos, which consists of two branches: a previewing branch and an intensive-reading branch. The previewing branch is designed to briefly capture the overview information of videos, while the intensive-reading branch is designed to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lijiabei-7/rivrl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttentive Walk-Aggregating Graph Neural Network