Long Short View Feature Decomposition via Contrastive Video   Representation Learning

Nadine Behrmann; Mohsen Fayyaz; Juergen Gall; Mehdi Noroozi

arXiv:2109.11593·cs.CV·September 27, 2021

Long Short View Feature Decomposition via Contrastive Video Representation Learning

Nadine Behrmann, Mohsen Fayyaz, Juergen Gall, Mehdi Noroozi

PDF

Open Access

TL;DR

This paper introduces a contrastive learning method that decomposes video representations into stationary and non-stationary features, improving downstream tasks like action recognition and segmentation by capturing different temporal attributes.

Contribution

It proposes a novel contrastive learning framework that separates stationary and non-stationary video features using long and short video views, enhancing task-specific performance.

Findings

01

Stationary features excel in action recognition tasks.

02

Non-stationary features improve action segmentation.

03

Learned features distinctly capture static versus dynamic attributes.

Abstract

Self-supervised video representation methods typically focus on the representation of temporal attributes in videos. However, the role of stationary versus non-stationary attributes is less explored: Stationary features, which remain similar throughout the video, enable the prediction of video-level action classes. Non-stationary features, which represent temporally varying attributes, are more beneficial for downstream tasks involving more fine-grained temporal understanding, such as action segmentation. We argue that a single representation to capture both types of features is sub-optimal, and propose to decompose the representation space into stationary and non-stationary features via contrastive learning from long and short views, i.e. long video sequences and their shorter sub-sequences. Stationary features are shared between the short and long views, while non-stationary features…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning

MethodsContrastive Learning