Efficient Spatialtemporal Context Modeling for Action Recognition
Congqi Cao, Yue Lu, Yifan Zhang, Dongmei Jiang, Yanning Zhang

TL;DR
This paper introduces RCCA-3D, a recurrent 3D criss-cross attention module that efficiently models long-range spatiotemporal context in videos, improving action recognition performance while reducing computational costs.
Contribution
The paper proposes a novel recurrent 3D criss-cross attention mechanism that captures dense long-range spatiotemporal relations efficiently for action recognition.
Findings
RCCA-3D reduces parameters and FLOPs by 25% and 30%.
The method outperforms state-of-the-art approaches on three datasets.
Optimal relation map factorization improves recognition accuracy.
Abstract
Contextual information plays an important role in action recognition. Local operations have difficulty to model the relation between two elements with a long-distance interval. However, directly modeling the contextual information between any two points brings huge cost in computation and memory, especially for action recognition, where there is an additional temporal dimension. Inspired from 2D criss-cross attention used in segmentation task, we propose a recurrent 3D criss-cross attention (RCCA-3D) module to model the dense long-range spatiotemporal contextual information in video for action recognition. The global context is factorized into sparse relation maps. We model the relationship between points in the same line along the direction of horizon, vertical and depth at each time, which forms a 3D criss-cross structure, and duplicate the same operation with recurrent mechanism to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
