Self-supervised Video Representation Learning with Cascade Positive Retrieval
Cheng-En Wu, Farley Lai, Yu Hen Hu, Asim Kadav

TL;DR
This paper introduces Cascade Positive Retrieval (CPR), a novel self-supervised learning method that progressively mines positive video examples across multiple views and stages, significantly improving video retrieval and action recognition performance.
Contribution
The paper proposes CPR, a new multi-stage positive example mining approach for self-supervised video representation learning, enhancing retrieval accuracy and downstream task performance.
Findings
CPR achieves 83.3% class mining recall, outperforming previous methods.
CPR improves state-of-the-art R@1 in video retrieval to 56.7%.
CPR enhances action recognition accuracy on UCF101 and HMDB51 datasets.
Abstract
Self-supervised video representation learning has been shown to effectively improve downstream tasks such as video retrieval and action recognition. In this paper, we present the Cascade Positive Retrieval (CPR) that successively mines positive examples w.r.t. the query for contrastive learning in a cascade of stages. Specifically, CPR exploits multiple views of a query example in different modalities, where an alternative view may help find another positive example dissimilar in the query view. We explore the effects of possible CPR configurations in ablations including the number of mining stages, the top similar example selection ratio in each stage, and progressive training with an incremental number of the final Top-k selection. The overall mining quality is measured to reflect the recall across training set classes. CPR reaches a median class mining recall of 83.3%, outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Cancer-related molecular mechanisms research
MethodsContrastive Learning
