Revitalize Region Feature for Democratizing Video-Language Pre-training   of Retrieval

Guanyu Cai; Yixiao Ge; Binjie Zhang; Alex Jinpeng Wang; Rui Yan,; Xudong Lin; Ying Shan; Lianghua He; Xiaohu Qie; Jianping Wu; Mike Zheng Shou

arXiv:2203.07720·cs.CV·February 8, 2023·1 cites

Revitalize Region Feature for Democratizing Video-Language Pre-training of Retrieval

Guanyu Cai, Yixiao Ge, Binjie Zhang, Alex Jinpeng Wang, Rui Yan,, Xudong Lin, Ying Shan, Lianghua He, Xiaohu Qie, Jianping Wu, Mike Zheng Shou

PDF

Open Access 2 Repos

TL;DR

This paper introduces a novel approach to video-language pre-training that revitalizes region features to reduce redundancy, enabling state-of-the-art retrieval performance with significantly less data and training time.

Contribution

It proposes a bidirectional region-word alignment regularization to enhance fine-grained relations between regions and text, improving efficiency and effectiveness in VLP.

Findings

01

Achieves competitive results with 80% less data

02

Reduces pre-training time by 85%

03

Outperforms previous methods on multiple datasets

Abstract

Recent dominant methods for video-language pre-training (VLP) learn transferable representations from the raw pixels in an end-to-end manner to achieve advanced performance on downstream video-language retrieval. Despite the impressive results, VLP research becomes extremely expensive with the need for massive data and a long training time, preventing further explorations. In this work, we revitalize region features of sparsely sampled video clips to significantly reduce both spatial and temporal visual redundancy towards democratizing VLP research at the same time achieving state-of-the-art results. Specifically, to fully explore the potential of region features, we introduce a novel bidirectional region-word alignment regularization that properly optimizes the fine-grained relations between regions and certain words in sentences, eliminating the domain/modality disconnections between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Cancer-related molecular mechanisms research