TSM: Temporal Shift Module for Efficient Video Understanding
Ji Lin, Chuang Gan, Song Han

TL;DR
The paper introduces the Temporal Shift Module (TSM), a novel method that enables efficient and accurate video understanding by shifting channels in 2D CNNs to incorporate temporal information without extra computation.
Contribution
It proposes TSM, a simple yet effective module that allows 2D CNNs to model temporal relationships, matching 3D CNN performance with much lower computational cost.
Findings
Achieves top performance on the Something-Something leaderboard.
Runs with low latency of 13ms on Jetson Nano.
Enables real-time online video recognition and detection.
Abstract
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost. Conventional 2D CNNs are computationally cheap but cannot capture temporal relationships; 3D CNN based methods can achieve good performance but are computationally intensive, making it expensive to deploy. In this paper, we propose a generic and effective Temporal Shift Module (TSM) that enjoys both high efficiency and high performance. Specifically, it can achieve the performance of 3D CNN but maintain 2D CNN's complexity. TSM shifts part of the channels along the temporal dimension; thus facilitate information exchanged among neighboring frames. It can be inserted into 2D CNNs to achieve temporal modeling at zero computation and zero parameters. We also extended TSM to online setting, which enables real-time low-latency online video recognition…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Google Maps Navigation with Online Temporal Shift Module on NVIDIA Jetson Nano· youtube
Google Maps Navigation with Online Temporal Shift Module· youtube
TSM: Temporal Shift Module for Efficient Video Understanding, online demo with NVIDIA Nano· youtube
Online Detection with Uni-directional TSM· youtube
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
