Learning Video Representations without Natural Videos

Xueyang Yu; Xinlei Chen; Yossi Gandelsman

arXiv:2410.24213·cs.CV·November 20, 2024

Learning Video Representations without Natural Videos

Xueyang Yu, Xinlei Chen, Yossi Gandelsman

PDF

Open Access

TL;DR

This paper demonstrates that effective video representations can be learned from synthetic videos and images, without using natural videos, by progressively modeling natural video properties through simple generative datasets.

Contribution

It introduces a method to pre-train video models using synthetic datasets that progressively incorporate natural video properties, reducing reliance on natural video data.

Findings

01

Pre-trained models on synthetic datasets achieve near state-of-the-art performance on UCF101.

02

Synthetic pre-training outperforms models pre-trained on natural videos on several out-of-distribution datasets.

03

Dataset properties like frame diversity and similarity to natural data correlate with downstream task performance.

Abstract

We show that useful video representations can be learned from synthetic videos and natural images, without incorporating natural videos in the training. We propose a progression of video datasets synthesized by simple generative processes, that model a growing set of natural video properties (e.g., motion, acceleration, and shape transformations). The downstream performance of video models pre-trained on these generated datasets gradually increases with the dataset progression. A VideoMAE model pre-trained on our synthetic videos closes 97.2\% of the performance gap on UCF101 action classification between training from scratch and self-supervised pre-training from natural videos, and outperforms the pre-trained model on HMDB51. Introducing crops of static images to the pre-training stage results in similar performance to UCF101 pre-training and outperforms the UCF101 pre-trained model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Innovations in Educational Methods · Online Learning and Analytics

MethodsSparse Evolutionary Training