Bi-Calibration Networks for Weakly-Supervised Video Representation   Learning

Fuchen Long; Ting Yao; Zhaofan Qiu; Xinmei Tian; Jiebo Luo; and Tao Mei

arXiv:2206.10491·cs.CV·June 22, 2022

Bi-Calibration Networks for Weakly-Supervised Video Representation Learning

Fuchen Long, Ting Yao, Zhaofan Qiu, Xinmei Tian, Jiebo Luo, and Tao Mei

PDF

Open Access 1 Repo

TL;DR

This paper introduces Bi-Calibration Networks (BCN), a novel weakly-supervised video representation learning method leveraging web videos and textual data, achieving superior downstream task performance.

Contribution

The paper proposes a new mutual calibration approach between query and text, along with large-scale web video datasets, to improve weakly-supervised video representation learning.

Findings

01

BCN outperforms state-of-the-art methods on downstream tasks.

02

Large-scale datasets YOVO-3M and YOVO-10M enable effective training.

03

Fine-tuning on 10M videos yields 1.6-1.8% accuracy improvements.

Abstract

The leverage of large volumes of web videos paired with the searched queries or surrounding texts (e.g., title) offers an economic and extensible alternative to supervised video representation learning. Nevertheless, modeling such weakly visual-textual connection is not trivial due to query polysemy (i.e., many possible meanings for a query) and text isomorphism (i.e., same syntactic structure of different text). In this paper, we introduce a new design of mutual calibration between query and text to boost weakly-supervised video representation learning. Specifically, we present Bi-Calibration Networks (BCN) that novelly couples two calibrations to learn the amendment from text to query and vice versa. Technically, BCN executes clustering on all the titles of the videos searched by an identical query and takes the centroid of each cluster as a text prototype. The query vocabulary is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fuchenustc/bcn
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsTemporaral Difference Network