Long Short View Feature Decomposition via Contrastive Video Representation Learning
Nadine Behrmann, Mohsen Fayyaz, Juergen Gall, Mehdi Noroozi

TL;DR
This paper introduces a contrastive learning method that decomposes video representations into stationary and non-stationary features, improving downstream tasks like action recognition and segmentation by capturing different temporal attributes.
Contribution
It proposes a novel contrastive learning framework that separates stationary and non-stationary video features using long and short video views, enhancing task-specific performance.
Findings
Stationary features excel in action recognition tasks.
Non-stationary features improve action segmentation.
Learned features distinctly capture static versus dynamic attributes.
Abstract
Self-supervised video representation methods typically focus on the representation of temporal attributes in videos. However, the role of stationary versus non-stationary attributes is less explored: Stationary features, which remain similar throughout the video, enable the prediction of video-level action classes. Non-stationary features, which represent temporally varying attributes, are more beneficial for downstream tasks involving more fine-grained temporal understanding, such as action segmentation. We argue that a single representation to capture both types of features is sub-optimal, and propose to decompose the representation space into stationary and non-stationary features via contrastive learning from long and short views, i.e. long video sequences and their shorter sub-sequences. Stationary features are shared between the short and long views, while non-stationary features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
MethodsContrastive Learning
