TS-LSTM and Temporal-Inception: Exploiting Spatiotemporal Dynamics for Activity Recognition
Chih-Yao Ma, Min-Hung Chen, Zsolt Kira, Ghassan AlRegib

TL;DR
This paper explores and compares different deep neural network architectures, including RNNs and Temporal-ConvNets, for capturing spatiotemporal dynamics in video-based activity recognition, achieving state-of-the-art results.
Contribution
It systematically evaluates two-stream ConvNets with RNNs and Temporal-ConvNets, proposing two new architectures to better exploit spatiotemporal information in videos.
Findings
Both RNNs and Temporal-ConvNets improve activity recognition performance.
Proper data segmentation is crucial for LSTM-based methods.
Achieved state-of-the-art accuracy on UCF101 and HMDB51 datasets.
Abstract
Recent two-stream deep Convolutional Neural Networks (ConvNets) have made significant progress in recognizing human actions in videos. Despite their success, methods extending the basic two-stream ConvNet have not systematically explored possible network architectures to further exploit spatiotemporal dynamics within video sequences. Further, such networks often use different baseline two-stream networks. Therefore, the differences and the distinguishing factors between various methods using Recurrent Neural Networks (RNN) or convolutional networks on temporally-constructed feature vectors (Temporal-ConvNet) are unclear. In this work, we first demonstrate a strong baseline two-stream ConvNet using ResNet-101. We use this baseline to thoroughly examine the use of both RNNs and Temporal-ConvNets for extracting spatiotemporal information. Building upon our experimental results, we then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
