Online pre-training with long-form videos
Itsuki Kato, Kodai Kamiya, Toru Tamaki

TL;DR
This paper explores online pre-training methods using long-form videos, comparing masked image modeling, contrastive learning, and knowledge distillation, and finds contrastive learning most effective for action recognition tasks.
Contribution
It introduces an evaluation of multiple online pre-training techniques on long videos and demonstrates the effectiveness of contrastive learning for downstream action recognition.
Findings
Contrastive learning outperforms other methods in downstream tasks.
Learning from long-form videos benefits short video action recognition.
Online pre-training improves model performance on action recognition.
Abstract
In this study, we investigate the impact of online pre-training with continuous video clips. We will examine three methods for pre-training (masked image modeling, contrastive learning, and knowledge distillation), and assess the performance on downstream action recognition tasks. As a result, online pre-training with contrast learning showed the highest performance in downstream tasks. Our findings suggest that learning from long-form videos can be helpful for action recognition with short videos.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIntelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics · Educational Games and Gamification
