Unsupervised Pre-training for Temporal Action Localization Tasks

Can Zhang; Tianyu Yang; Junwu Weng; Meng Cao; Jue Wang; Yuexian Zou

arXiv:2203.13609·cs.CV·March 28, 2022·5 cites

Unsupervised Pre-training for Temporal Action Localization Tasks

Can Zhang, Tianyu Yang, Junwu Weng, Meng Cao, Jue Wang, Yuexian Zou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel self-supervised pretext task called Pseudo Action Localization (PAL) that improves unsupervised video representation learning specifically for temporal action localization by aligning features of pseudo-labeled regions.

Contribution

It proposes the first self-supervised pretraining method tailored for temporal action localization, bridging the gap between classification and localization tasks.

Findings

01

PAL significantly boosts TAL performance with large-scale unlabeled data.

02

The method introduces a temporal equivariant contrastive learning paradigm.

03

Extensive experiments validate PAL's effectiveness over existing approaches.

Abstract

Unsupervised video representation learning has made remarkable achievements in recent years. However, most existing methods are designed and optimized for video classification. These pre-trained models can be sub-optimal for temporal localization tasks due to the inherent discrepancy between video-level classification and clip-level localization. To bridge this gap, we make the first attempt to propose a self-supervised pretext task, coined as Pseudo Action Localization (PAL) to Unsupervisedly Pre-train feature encoders for Temporal Action Localization tasks (UP-TAL). Specifically, we first randomly select temporal regions, each of which contains multiple clips, from one video as pseudo actions and then paste them onto different temporal positions of the other two videos. The pretext task is to align the features of pasted pseudo action regions from two synthetic videos and maximize the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhang-can/up-tal
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Advanced Vision and Imaging

MethodsContrastive Learning · ALIGN