When Spatial meets Temporal in Action Recognition

Huilin Chen; Lei Wang; Yifan Chen; Tom Gedeon; Piotr Koniusz

arXiv:2411.15284·cs.CV·November 26, 2024

When Spatial meets Temporal in Action Recognition

Huilin Chen, Lei Wang, Yifan Chen, Tom Gedeon, Piotr Koniusz

PDF

Open Access

TL;DR

This paper introduces the TIME layer, a novel preprocessing technique that integrates spatial and temporal information in video frames, significantly improving action recognition accuracy across various models and data types.

Contribution

The paper proposes the TIME layer, a new method for combining spatial and temporal features by rearranging video frames into a spatial grid, enhancing existing models' performance.

Findings

01

TIME layer improves recognition accuracy in multiple models

02

Effective integration of spatial and temporal information

03

Applicable to RGB and depth video data

Abstract

Video action recognition has made significant strides, but challenges remain in effectively using both spatial and temporal information. While existing methods often focus on either spatial features (e.g., object appearance) or temporal dynamics (e.g., motion), they rarely address the need for a comprehensive integration of both. Capturing the rich temporal evolution of video frames, while preserving their spatial details, is crucial for improving accuracy. In this paper, we introduce the Temporal Integration and Motion Enhancement (TIME) layer, a novel preprocessing technique designed to incorporate temporal information. The TIME layer generates new video frames by rearranging the original sequence, preserving temporal order while embedding $N^{2}$ temporally evolving frames into a single spatial grid of size $N \times N$ . This transformation creates new frames that balance both spatial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · Label Smoothing · Dropout · Linear Layer · Byte Pair Encoding · Adam · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings