ASM-Loc: Action-aware Segment Modeling for Weakly-Supervised Temporal Action Localization
Bo He, Xitong Yang, Le Kang, Zhiyu Cheng, Xin Zhou, Abhinav, Shrivastava

TL;DR
This paper introduces ASM-Loc, a novel weakly-supervised temporal action localization framework that models action segments explicitly using attention, dynamic sampling, and pseudo supervision, achieving state-of-the-art results.
Contribution
ASM-Loc is the first to incorporate explicit action-aware segment modeling with attention and pseudo supervision in weakly-supervised temporal action localization.
Findings
Achieves new state-of-the-art on THUMOS-14 and ActivityNet-v1.3 datasets.
Improves boundary prediction accuracy with pseudo instance supervision.
Effectively models temporal dependencies with attention mechanisms.
Abstract
Weakly-supervised temporal action localization aims to recognize and localize action segments in untrimmed videos given only video-level action labels for training. Without the boundary information of action segments, existing methods mostly rely on multiple instance learning (MIL), where the predictions of unlabeled instances (i.e., video snippets) are supervised by classifying labeled bags (i.e., untrimmed videos). However, this formulation typically treats snippets in a video as independent instances, ignoring the underlying temporal structures within and across action segments. To address this problem, we propose \system, a novel WTAL framework that enables explicit, action-aware segment modeling beyond standard MIL-based methods. Our framework entails three segment-centric components: (i) dynamic segment sampling for compensating the contribution of short actions; (ii) intra- and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
