Data Collection-free Masked Video Modeling
Yuchi Ishikawa, Masayoshi Kondo, Yoshimitsu Aoki

TL;DR
This paper introduces a self-supervised video pre-training method that uses pseudo-motion videos generated from static images, reducing data collection costs and privacy concerns while effectively learning spatio-temporal features.
Contribution
It proposes the Pseudo Motion Generator (PMG) module that creates pseudo-motion videos from images, enabling data collection-free pre-training for video transformers.
Findings
Significant improvement in action recognition accuracy.
Outperforms existing static-image-based methods.
Partially outperforms methods using real and synthetic videos.
Abstract
Pre-training video transformers generally requires a large amount of data, presenting significant challenges in terms of data collection costs and concerns related to privacy, licensing, and inherent biases. Synthesizing data is one of the promising ways to solve these issues, yet pre-training solely on synthetic data has its own challenges. In this paper, we introduce an effective self-supervised learning framework for videos that leverages readily available and less costly static images. Specifically, we define the Pseudo Motion Generator (PMG) module that recursively applies image transformations to generate pseudo-motion videos from images. These pseudo-motion videos are then leveraged in masked video modeling. Our approach is applicable to synthetic images as well, thus entirely freeing video pre-training from data collection costs and other concerns in real data. Through…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputer Graphics and Visualization Techniques
