Self-Supervised Learning for Videos: A Survey

Madeline C. Schiappa; Yogesh S. Rawat; Mubarak Shah

arXiv:2207.00419·cs.CV·July 20, 2023·5 cites

Self-Supervised Learning for Videos: A Survey

Madeline C. Schiappa, Yogesh S. Rawat, Mubarak Shah

PDF

Open Access 1 Repo

TL;DR

This survey reviews self-supervised learning methods for videos, highlighting their categories, datasets, evaluation tasks, limitations, and future directions, emphasizing the challenges and opportunities unique to the video domain.

Contribution

It categorizes existing self-supervised video learning approaches, providing a comprehensive overview and insights into current limitations and future research directions.

Findings

01

Self-supervised learning reduces dependence on annotated data for videos.

02

Four main categories of methods: pretext tasks, generative, contrastive, cross-modal.

03

Identifies key datasets and evaluation benchmarks for video self-supervised learning.

Abstract

The remarkable success of deep learning in various domains relies on the availability of large-scale annotated datasets. However, obtaining annotations is expensive and requires great effort, which is especially challenging for videos. Moreover, the use of human-generated annotations leads to models with biased learning and poor domain generalization and robustness. As an alternative, self-supervised learning provides a way for representation learning which does not require annotations and has shown promise in both image and video domains. Different from the image domain, learning video representations are more challenging due to the temporal dimension, bringing in motion and other environmental dynamics. This also provides opportunities for video-exclusive ideas that advance self-supervised learning in the video and multimodal domain. In this survey, we provide a review of existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Maddy12/SSL4VideoSurvey
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Education and Learning Interventions · Video Analysis and Summarization