Back to the Future: Cycle Encoding Prediction for Self-supervised Contrastive Video Representation Learning
Xinyu Yang, Majid Mirmehdi, Tilo Burghardt

TL;DR
This paper introduces Cycle Encoding Prediction (CEP), a self-supervised learning method that encodes high-level spatio-temporal structures in videos by predicting temporal cycles, improving action recognition performance.
Contribution
The paper proposes a novel self-supervised learning approach, CEP, that captures temporal cycle structures in videos for better feature representation in action recognition.
Findings
Significantly improved accuracy on UCF101 and HMDB51 datasets.
Effective encoding of high-level temporal cycles enhances video feature learning.
Ablation studies confirm the importance of cycle closure and contrastive loss components.
Abstract
In this paper we show that learning video feature spaces in which temporal cycles are maximally predictable benefits action classification. In particular, we propose a novel learning approach termed Cycle Encoding Prediction (CEP) that is able to effectively represent high-level spatio-temporal structure of unlabelled video content. CEP builds a latent space wherein the concept of closed forward-backward as well as backward-forward temporal loops is approximately preserved. As a self-supervision signal, CEP leverages the bi-directional temporal coherence of the video stream and applies loss functions that encourage both temporal cycle closure as well as contrastive feature separation. Architecturally, the underpinning network structure utilises a single feature encoder for all video snippets, adding two predictive modules that learn temporal forward and backward transitions. We apply…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Advanced Vision and Imaging
