Context-Aware Network Based on Multi-scale Spatio-temporal Attention for Action Recognition in Videos
Xiaoyang Li, Wenzhu Yang, Kanglin Wang, Tiebiao Wang, Qingsong Fei

TL;DR
This paper introduces the Context-Aware Network (CAN) that captures multi-scale spatio-temporal cues for improved video action recognition, outperforming existing methods on multiple benchmark datasets.
Contribution
The paper proposes a novel network with multi-scale temporal and spatial modules, effectively capturing diverse action cues for enhanced recognition accuracy.
Findings
Achieves state-of-the-art performance on five benchmark datasets.
Effectively captures multi-scale spatio-temporal cues.
Outperforms most mainstream methods in accuracy.
Abstract
Action recognition is a critical task in video understanding, requiring the comprehensive capture of spatio-temporal cues across various scales. However, existing methods often overlook the multi-granularity nature of actions. To address this limitation, we introduce the Context-Aware Network (CAN). CAN consists of two core modules: the Multi-scale Temporal Cue Module (MTCM) and the Group Spatial Cue Module (GSCM). MTCM effectively extracts temporal cues at multiple scales, capturing both fast-changing motion details and overall action flow. GSCM, on the other hand, extracts spatial cues at different scales by grouping feature maps and applying specialized extraction methods to each group. Experiments conducted on five benchmark datasets (Something-Something V1 and V2, Diving48, Kinetics-400, and UCF101) demonstrate the effectiveness of CAN. Our approach achieves competitive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Action Observation and Synchronization
