RaP: Redundancy-aware Video-language Pre-training for Text-Video   Retrieval

Xing Wu; Chaochen Gao; Zijia Lin; Zhongyuan Wang; Jizhong Han; Songlin; Hu

arXiv:2210.06881·cs.CV·October 14, 2022

RaP: Redundancy-aware Video-language Pre-training for Text-Video Retrieval

Xing Wu, Chaochen Gao, Zijia Lin, Zhongyuan Wang, Jizhong Han, Songlin, Hu

PDF

Open Access 1 Repo

TL;DR

RaP introduces a redundancy-aware pre-training approach for text-video retrieval that measures and penalizes inter-modal redundancy, leading to improved performance on multiple benchmarks.

Contribution

The paper proposes a novel redundancy measurement and contrastive learning method to address inter-modal redundancy in video-language pre-training.

Findings

01

Significant improvement over state-of-the-art on four benchmarks.

02

Effective reduction of inter-modal redundancy.

03

Enhanced shared semantic learning across modalities.

Abstract

Video language pre-training methods have mainly adopted sparse sampling techniques to alleviate the temporal redundancy of videos. Though effective, sparse sampling still suffers inter-modal redundancy: visual redundancy and textual redundancy. Compared with highly generalized text, sparsely sampled frames usually contain text-independent portions, called visual redundancy. Sparse sampling is also likely to miss important frames corresponding to some text portions, resulting in textual redundancy. Inter-modal redundancy leads to a mismatch of video and text information, hindering the model from better learning the shared semantics across modalities. To alleviate it, we propose Redundancy-aware Video-language Pre-training. We design a redundancy measurement of video patches and text tokens by calculating the cross-modal minimum dis-similarity. Then, we penalize the highredundant video…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

caskcsg/vlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research