Temporal Segment Networks for Action Recognition in Videos
Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang,, Luc Van Gool

TL;DR
This paper introduces Temporal Segment Networks (TSN), a flexible framework for action recognition in videos that models long-range temporal structures, achieving state-of-the-art results on multiple benchmarks with efficient computation.
Contribution
The paper proposes a novel segment-based sampling and aggregation method within TSN, enabling effective learning of long-range temporal dependencies in videos for action recognition.
Findings
Achieved state-of-the-art accuracy on four benchmarks.
Efficiently models long-range temporal structures in videos.
Won the ActivityNet challenge 2016 video classification track.
Abstract
Deep convolutional networks have achieved great success for image recognition. However, for action recognition in videos, their advantage over traditional methods is not so evident. We present a general and flexible video-level framework for learning action models in videos. This method, called temporal segment network (TSN), aims to model long-range temporal structures with a new segment-based sampling and aggregation module. This unique design enables our TSN to efficiently learn action models by using the whole action videos. The learned models could be easily adapted for action recognition in both trimmed and untrimmed videos with simple average pooling and multi-scale temporal window integration, respectively. We also study a series of good practices for the instantiation of TSN framework given limited training samples. Our approach obtains the state-the-of-art performance on four…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Gait Recognition and Analysis
MethodsAverage Pooling
