Static and Dynamic Concepts for Self-supervised Video Representation   Learning

Rui Qian; Shuangrui Ding; Xian Liu; Dahua Lin

arXiv:2207.12795·cs.CV·July 27, 2022·1 cites

Static and Dynamic Concepts for Self-supervised Video Representation Learning

Rui Qian, Shuangrui Ding, Xian Liu, Dahua Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a self-supervised video representation learning method that models static and dynamic concepts separately, using a novel learning scheme with regularizations and cross-attention, achieving state-of-the-art results on multiple datasets.

Contribution

It proposes a new approach to decouple static and dynamic concepts in videos using static frames and frame differences, with regularizations and cross-attention for improved understanding.

Findings

01

Achieves state-of-the-art accuracy on UCF-101, HMDB-51, and Diving-48 datasets.

02

Effectively disentangles static and dynamic concepts for better video representation.

03

Demonstrates the importance of local concept attention in video understanding.

Abstract

In this paper, we propose a novel learning scheme for self-supervised video representation learning. Motivated by how humans understand videos, we propose to first learn general visual concepts then attend to discriminative local areas for video understanding. Specifically, we utilize static frame and frame difference to help decouple static and dynamic concepts, and respectively align the concept distributions in latent space. We add diversity and fidelity regularizations to guarantee that we learn a compact set of meaningful concepts. Then we employ a cross-attention mechanism to aggregate detailed local features of different concepts, and filter out redundant concepts with low activations to perform local concept contrast. Extensive experiments demonstrate that our method distills meaningful static and dynamic concepts to guide video understanding, and obtains state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shvdiwnkozbw/Self-supervised-Video-Concept
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications · Human Pose and Action Recognition

MethodsALIGN