TempCLR: Temporal Alignment Representation with Contrastive Learning

Yuncong Yang; Jiawei Ma; Shiyuan Huang; Long Chen; Xudong Lin,; Guangxing Han; Shih-Fu Chang

arXiv:2212.13738·cs.CV·March 31, 2023

TempCLR: Temporal Alignment Representation with Contrastive Learning

Yuncong Yang, Jiawei Ma, Shiyuan Huang, Long Chen, Xudong Lin,, Guangxing Han, Shih-Fu Chang

PDF

Open Access 1 Repo 1 Video

TL;DR

TempCLR introduces a contrastive learning framework that explicitly aligns full videos with paragraphs by modeling temporal sequences, improving performance in video understanding tasks such as retrieval and action recognition.

Contribution

The paper proposes TempCLR, a novel sequence-level contrastive learning method that explicitly models temporal dynamics using dynamic time warping for better video-paragraph alignment.

Findings

01

Improves video retrieval accuracy

02

Enhances action step localization performance

03

Boosts few-shot action recognition results

Abstract

Video representation learning has been successful in video-text pre-training for zero-shot transfer, where each sentence is trained to be close to the paired video clips in a common feature space. For long videos, given a paragraph of description where the sentences describe different segments of the video, by matching all sentence-clip pairs, the paragraph and the full video are aligned implicitly. However, such unit-level comparison may ignore global temporal context, which inevitably limits the generalization ability. In this paper, we propose a contrastive learning framework TempCLR to compare the full video and the paragraph explicitly. As the video/paragraph is formulated as a sequence of clips/sentences, under the constraint of their temporal order, we use dynamic time warping to compute the minimum cumulative cost over sentence-clip pairs as the sequence-level distance. To…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yyuncong/tempclr
pytorchOfficial

Videos

TempCLR: Temporal Alignment Representation with Contrastive Learning· slideslive

Taxonomy

TopicsHuman Pose and Action Recognition · Video Analysis and Summarization · Multimodal Machine Learning Applications

MethodsContrastive Learning