Learning Transferable Self-attentive Representations for Action Recognition in Untrimmed Videos with Weak Supervision
Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Kai Zheng, Xiaobin Zhu,, Lixin Duan

TL;DR
This paper introduces a weakly supervised framework for action recognition in untrimmed videos that leverages self-attention for frame localization and transfers knowledge from trimmed videos to improve classification accuracy.
Contribution
It proposes a novel approach combining self-attention for action frame localization and knowledge transfer from trimmed videos, addressing annotation cost issues.
Findings
Effective in localizing action frames using self-attention.
Improves classification performance by transferring knowledge from trimmed videos.
Validated on THUMOS14 and ActivityNet1.3 datasets.
Abstract
Action recognition in videos has attracted a lot of attention in the past decade. In order to learn robust models, previous methods usually assume videos are trimmed as short sequences and require ground-truth annotations of each video frame/sequence, which is quite costly and time-consuming. In this paper, given only video-level annotations, we propose a novel weakly supervised framework to simultaneously locate action frames as well as recognize actions in untrimmed videos. Our proposed framework consists of two major components. First, for action frame localization, we take advantage of the self-attention mechanism to weight each frame, such that the influence of background frames can be effectively eliminated. Second, considering that there are trimmed videos publicly available and also they contain useful information to leverage, we present an additional module to transfer the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
