Temporal Transformer Networks with Self-Supervision for Action Recognition
Yongkang Zhang, Jun Li, Guoming Wu, Han Zhang, Zhiping Shi, Zhaoxun, Liu, Zizhang Wu

TL;DR
This paper introduces a Temporal Transformer Network with Self-supervision (TTSN) that models long-range temporal dependencies and reverses motion sequences, significantly improving action recognition performance on multiple datasets.
Contribution
The paper presents a novel TTSN architecture combining a temporal transformer and a self-supervision module with sequence reversal, enhancing motion feature modeling and generalization.
Findings
Achieves state-of-the-art results on HMDB51, UCF101, and Something-something V1 datasets.
Effectively models non-linear temporal dependencies among frames.
Improves robustness by reversing frame sequences for self-supervision.
Abstract
In recent years, 2D Convolutional Networks-based video action recognition has encouragingly gained wide popularity; However, constrained by the lack of long-range non-linear temporal relation modeling and reverse motion information modeling, the performance of existing models is, therefore, undercut seriously. To address this urgent problem, we introduce a startling Temporal Transformer Network with Self-supervision (TTSN). Our high-performance TTSN mainly consists of a temporal transformer module and a temporal sequence self-supervision module. Concisely speaking, we utilize the efficient temporal transformer module to model the non-linear temporal dependencies among non-local frames, which significantly enhances complex motion feature representations. The temporal sequence self-supervision module we employ unprecedentedly adopts the streamlined strategy of "random batch random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Anomaly Detection Techniques and Applications
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Dropout · Label Smoothing · Byte Pair Encoding · Softmax · Absolute Position Encodings · Adam · Position-Wise Feed-Forward Layer
