Weakly-Supervised Action Localization by Hierarchically-structured Latent Attention Modeling
Guiqin Wang, Peng Zhao, Cong Zhao, Shusen Yang, Jie Cheng, and Luziwei Leng, Jianxing Liao, Qinghai Guo

TL;DR
This paper introduces a hierarchical latent attention model for weakly-supervised action localization in videos, effectively capturing temporal feature variations to improve localization accuracy.
Contribution
It proposes a novel hierarchical model combining change-point detection and attention-based classification to better handle temporal variations in weakly-supervised settings.
Findings
Outperforms current state-of-the-art methods on THUMOS-14 and ActivityNet-v1.3 datasets.
Achieves performance comparable to fully-supervised methods.
Effectively models temporal feature variations for improved localization.
Abstract
Weakly-supervised action localization aims to recognize and localize action instancese in untrimmed videos with only video-level labels. Most existing models rely on multiple instance learning(MIL), where the predictions of unlabeled instances are supervised by classifying labeled bags. The MIL-based methods are relatively well studied with cogent performance achieved on classification but not on localization. Generally, they locate temporal regions by the video-level classification but overlook the temporal variations of feature semantics. To address this problem, we propose a novel attention-based hierarchically-structured latent model to learn the temporal variations of feature semantics. Specifically, our model entails two components, the first is an unsupervised change-points detection module that detects change-points by learning the latent representations of video features in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
