Proposal-Based Multiple Instance Learning for Weakly-Supervised Temporal Action Localization
Huan Ren, Wenfei Yang, Tianzhu Zhang, Yongdong Zhang

TL;DR
This paper introduces a Proposal-based Multiple Instance Learning framework for weakly-supervised temporal action localization, addressing inconsistencies in previous methods by classifying proposals directly and incorporating contrastive, completeness, and modality consistency modules.
Contribution
It proposes a novel Proposal-based MIL framework with three key modules to improve proposal classification and localization accuracy in weakly-supervised video analysis.
Findings
Outperforms existing methods on THUMOS14 and ActivityNet benchmarks.
Effectively suppresses low-quality proposals and enhances detection robustness.
Achieves superior localization accuracy with proposed modules.
Abstract
Weakly-supervised temporal action localization aims to localize and recognize actions in untrimmed videos with only video-level category labels during training. Without instance-level annotations, most existing methods follow the Segment-based Multiple Instance Learning (S-MIL) framework, where the predictions of segments are supervised by the labels of videos. However, the objective for acquiring segment-level scores during training is not consistent with the target for acquiring proposal-level scores during testing, leading to suboptimal results. To deal with this problem, we propose a novel Proposal-based Multiple Instance Learning (P-MIL) framework that directly classifies the candidate proposals in both the training and testing stages, which includes three key designs: 1) a surrounding contrastive feature extraction module to suppress the discriminative short proposals by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
