Temporal Transformer Networks with Self-Supervision for Action   Recognition

Yongkang Zhang; Jun Li; Guoming Wu; Han Zhang; Zhiping Shi; Zhaoxun; Liu; Zizhang Wu

arXiv:2112.07338·cs.CV·December 20, 2021·1 cites

Temporal Transformer Networks with Self-Supervision for Action Recognition

Yongkang Zhang, Jun Li, Guoming Wu, Han Zhang, Zhiping Shi, Zhaoxun, Liu, Zizhang Wu

PDF

Open Access

TL;DR

This paper introduces a Temporal Transformer Network with Self-supervision (TTSN) that models long-range temporal dependencies and reverses motion sequences, significantly improving action recognition performance on multiple datasets.

Contribution

The paper presents a novel TTSN architecture combining a temporal transformer and a self-supervision module with sequence reversal, enhancing motion feature modeling and generalization.

Findings

01

Achieves state-of-the-art results on HMDB51, UCF101, and Something-something V1 datasets.

02

Effectively models non-linear temporal dependencies among frames.

03

Improves robustness by reversing frame sequences for self-supervision.

Abstract

In recent years, 2D Convolutional Networks-based video action recognition has encouragingly gained wide popularity; However, constrained by the lack of long-range non-linear temporal relation modeling and reverse motion information modeling, the performance of existing models is, therefore, undercut seriously. To address this urgent problem, we introduce a startling Temporal Transformer Network with Self-supervision (TTSN). Our high-performance TTSN mainly consists of a temporal transformer module and a temporal sequence self-supervision module. Concisely speaking, we utilize the efficient temporal transformer module to model the non-linear temporal dependencies among non-local frames, which significantly enhances complex motion feature representations. The temporal sequence self-supervision module we employ unprecedentedly adopts the streamlined strategy of "random batch random…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Label Smoothing · Byte Pair Encoding · Softmax · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer