Enhancing Self-supervised Video Representation Learning via Multi-level   Feature Optimization

Rui Qian; Yuxi Li; Huabin Liu; John See; Shuangrui Ding; Xian Liu,; Dian Li; Weiyao Lin

arXiv:2108.02183·cs.CV·August 18, 2021

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

Rui Qian, Yuxi Li, Huabin Liu, John See, Shuangrui Ding, Xian Liu,, Dian Li, Weiyao Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-level feature optimization framework that enhances self-supervised video representation learning by integrating high-, mid-, and low-level features with graph constraints and temporal modeling, leading to improved video understanding.

Contribution

It proposes a novel multi-level feature optimization method that leverages distribution graphs and temporal modules to improve generalization and motion understanding in self-supervised video learning.

Findings

01

Significant improvement in video representation quality.

02

Enhanced temporal modeling capabilities.

03

Better generalization across video understanding tasks.

Abstract

The crux of self-supervised video representation learning is to build general features from unlabeled videos. However, most recent works have mainly focused on high-level semantics and neglected lower-level representations and their temporal relationship which are crucial for general video understanding. To address these challenges, this paper proposes a multi-level feature optimization framework to improve the generalization and temporal modeling ability of learned video representations. Concretely, high-level features obtained from naive and prototypical contrastive learning are utilized to build distribution graphs, guiding the process of low-level and mid-level feature learning. We also devise a simple temporal modeling module from multi-level features to enhance motion pattern learning. Experiments demonstrate that multi-level feature optimization with the graph constraint and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

shvdiwnkozbw/video-representation-via-multi-level-optimization
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning