Online pre-training with long-form videos

Itsuki Kato; Kodai Kamiya; Toru Tamaki

arXiv:2408.15651·cs.CV·August 29, 2024

Online pre-training with long-form videos

Itsuki Kato, Kodai Kamiya, Toru Tamaki

PDF

Open Access

TL;DR

This paper explores online pre-training methods using long-form videos, comparing masked image modeling, contrastive learning, and knowledge distillation, and finds contrastive learning most effective for action recognition tasks.

Contribution

It introduces an evaluation of multiple online pre-training techniques on long videos and demonstrates the effectiveness of contrastive learning for downstream action recognition.

Findings

01

Contrastive learning outperforms other methods in downstream tasks.

02

Learning from long-form videos benefits short video action recognition.

03

Online pre-training improves model performance on action recognition.

Abstract

In this study, we investigate the impact of online pre-training with continuous video clips. We will examine three methods for pre-training (masked image modeling, contrastive learning, and knowledge distillation), and assess the performance on downstream action recognition tasks. As a result, online pre-training with contrast learning showed the highest performance in downstream tasks. Our findings suggest that learning from long-form videos can be helpful for action recognition with short videos.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics · Educational Games and Gamification