Action Recognition in Untrimmed Videos with Composite Self-Attention Two-Stream Framework
Dong Cao, Lisha Xu, HaiBo Chen

TL;DR
This paper introduces a composite self-attention two-stream framework with graph networks for improved zero-shot action recognition in untrimmed videos, emphasizing key frame weighting and multi-aspect attention.
Contribution
The paper proposes a novel composite two-stream framework with 3-channel self-attention and graph networks for zero-shot action recognition in untrimmed videos, enhancing feature extraction and key frame focus.
Findings
Effective in zero-shot action recognition
Improves focus on key frames in untrimmed videos
Validated on relevant datasets with positive results
Abstract
With the rapid development of deep learning algorithms, action recognition in video has achieved many important research results. One issue in action recognition, Zero-Shot Action Recognition (ZSAR), has recently attracted considerable attention, which classify new categories without any positive examples. Another difficulty in action recognition is that untrimmed data may seriously affect model performance. We propose a composite two-stream framework with a pre-trained model. Our proposed framework includes a classifier branch and a composite feature branch. The graph network model is adopted in each of the two branches, which effectively improves the feature extraction and reasoning ability of the framework. In the composite feature branch, a 3-channel self-attention models are constructed to weight each frame in the video and give more attention to the key frames. Each self-attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
