VLG: General Video Recognition with Web Textual Knowledge

Jintao Lin; Zhaoyang Liu; Wenhai Wang; Wayne Wu; Limin Wang

arXiv:2212.01638·cs.CV·December 6, 2022

VLG: General Video Recognition with Web Textual Knowledge

Jintao Lin, Zhaoyang Liu, Wenhai Wang, Wayne Wu, Limin Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces VLG, a unified visual-linguistic framework leveraging web textual knowledge for general video recognition across diverse challenging settings, and establishes a comprehensive benchmark for this task.

Contribution

It proposes a novel two-stage training paradigm for GVR using external web text and creates a new benchmark dataset covering multiple recognition scenarios.

Findings

01

VLG achieves state-of-the-art results across all tested settings.

02

The framework demonstrates strong generalization and effectiveness.

03

The benchmark facilitates future research in general video recognition.

Abstract

Video recognition in an open and dynamic world is quite challenging, as we need to handle different settings such as close-set, long-tail, few-shot and open-set. By leveraging semantic knowledge from noisy text descriptions crawled from the Internet, we focus on the general video recognition (GVR) problem of solving different recognition tasks within a unified framework. The core contribution of this paper is twofold. First, we build a comprehensive video recognition benchmark of Kinetics-GVR, including four sub-task datasets to cover the mentioned settings. To facilitate the research of GVR, we propose to utilize external textual knowledge from the Internet and provide multi-source text descriptions for all action classes. Second, inspired by the flexibility of language representation, we present a unified visual-linguistic framework (VLG) to solve the problem of GVR by an effective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mcg-nju/vlg
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning