When Spatial meets Temporal in Action Recognition
Huilin Chen, Lei Wang, Yifan Chen, Tom Gedeon, Piotr Koniusz

TL;DR
This paper introduces the TIME layer, a novel preprocessing technique that integrates spatial and temporal information in video frames, significantly improving action recognition accuracy across various models and data types.
Contribution
The paper proposes the TIME layer, a new method for combining spatial and temporal features by rearranging video frames into a spatial grid, enhancing existing models' performance.
Findings
TIME layer improves recognition accuracy in multiple models
Effective integration of spatial and temporal information
Applicable to RGB and depth video data
Abstract
Video action recognition has made significant strides, but challenges remain in effectively using both spatial and temporal information. While existing methods often focus on either spatial features (e.g., object appearance) or temporal dynamics (e.g., motion), they rarely address the need for a comprehensive integration of both. Capturing the rich temporal evolution of video frames, while preserving their spatial details, is crucial for improving accuracy. In this paper, we introduce the Temporal Integration and Motion Enhancement (TIME) layer, a novel preprocessing technique designed to incorporate temporal information. The TIME layer generates new video frames by rearranging the original sequence, preserving temporal order while embedding temporally evolving frames into a single spatial grid of size . This transformation creates new frames that balance both spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
MethodsAttention Is All You Need · Label Smoothing · Dropout · Linear Layer · Byte Pair Encoding · Adam · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings
