Spatiotemporal Residual Networks for Video Action Recognition
Christoph Feichtenhofer, Axel Pinz, Richard P. Wildes

TL;DR
This paper introduces spatiotemporal Residual Networks that combine two-stream ConvNets and ResNets to improve video action recognition by enabling better spatiotemporal interaction and hierarchical feature learning.
Contribution
The paper presents a novel architecture that integrates residual connections into two-stream ConvNets for spatiotemporal modeling, enhancing action recognition performance.
Findings
Outperforms previous state-of-the-art on benchmark datasets
Effectively models spatiotemporal interactions in videos
Enables end-to-end training of deep spatiotemporal networks
Abstract
Two-stream Convolutional Networks (ConvNets) have shown strong performance for human action recognition in videos. Recently, Residual Networks (ResNets) have arisen as a new technique to train extremely deep architectures. In this paper, we introduce spatiotemporal ResNets as a combination of these two approaches. Our novel architecture generalizes ResNets for the spatiotemporal domain by introducing residual connections in two ways. First, we inject residual connections between the appearance and motion pathways of a two-stream architecture to allow spatiotemporal interaction between the two streams. Second, we transform pretrained image ConvNets into spatiotemporal networks by equipping these with learnable convolutional filters that are initialized as temporal residual connections and operate on adjacent feature maps in time. This approach slowly increases the spatiotemporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
