Time-Equivariant Contrastive Video Representation Learning
Simon Jenni, Hailin Jin

TL;DR
This paper presents a self-supervised contrastive learning approach that encodes temporal transformations to learn video representations that preserve dynamics, achieving state-of-the-art results in video retrieval and action recognition.
Contribution
It introduces a novel method that enforces temporal equivariance in video representations, unlike previous invariance-based approaches.
Findings
Achieves state-of-the-art results on UCF101, HMDB51, and Diving48.
Effectively encodes temporal transformations for better video understanding.
Improves video retrieval and action recognition performance.
Abstract
We introduce a novel self-supervised contrastive learning method to learn representations from unlabelled videos. Existing approaches ignore the specifics of input distortions, e.g., by learning invariance to temporal transformations. Instead, we argue that video representation should preserve video dynamics and reflect temporal manipulations of the input. Therefore, we exploit novel constraints to build representations that are equivariant to temporal transformations and better capture video dynamics. In our method, relative temporal transformations between augmented clips of a video are encoded in a vector and contrasted with other transformation vectors. To support temporal equivariance learning, we additionally propose the self-supervised classification of two clips of a video into 1. overlapping 2. ordered, or 3. unordered. Our experiments show that time-equivariant representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
MethodsContrastive Learning
