ACM-Net: Action Context Modeling Network for Weakly-Supervised Temporal Action Localization
Sanqing Qu, Guang Chen, Zhijun Li, Lijun Zhang, Fan Lu, Alois Knoll

TL;DR
ACM-Net introduces a three-branch attention approach for weakly-supervised temporal action localization, effectively distinguishing action, context, and background frames to improve localization accuracy.
Contribution
The paper proposes ACM-Net, a novel network with a three-branch attention module that models action, context, and background frames separately for better localization.
Findings
Outperforms current state-of-the-art methods on THUMOS-14 and ActivityNet-1.3 datasets.
Achieves performance comparable to fully-supervised methods.
Demonstrates the effectiveness of multi-branch attention in weakly-supervised settings.
Abstract
Weakly-supervised temporal action localization aims to localize action instances temporal boundary and identify the corresponding action category with only video-level labels. Traditional methods mainly focus on foreground and background frames separation with only a single attention branch and class activation sequence. However, we argue that apart from the distinctive foreground and background frames there are plenty of semantically ambiguous action context frames. It does not make sense to group those context frames to the same background class since they are semantically related to a specific action category. Consequently, it is challenging to suppress action context frames with only a single class activation sequence. To address this issue, in this paper, we propose an action-context modeling network termed ACM-Net, which integrates a three-branch attention module to measure the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
