Proposal-based Temporal Action Localization with Point-level Supervision
Yuan Yin, Yifei Huang, Ryosuke Furuta, Yoichi Sato

TL;DR
This paper introduces a novel point-level supervised temporal action localization method that generates flexible-duration action proposals and dense pseudo labels, outperforming existing methods on multiple benchmarks.
Contribution
It proposes a new approach that overcomes MIL limitations by using action proposals, clustering for pseudo labels, and contrastive loss for refinement, advancing PTAL performance.
Findings
Achieves state-of-the-art results on ActivityNet 1.3 and THUMOS 14
Outperforms some fully-supervised methods on benchmark datasets
Demonstrates effectiveness of proposal generation and pseudo label refinement
Abstract
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a single point (frame) within every action instance is annotated in training data. Without temporal annotations, most previous works adopt the multiple instance learning (MIL) framework, where the input video is segmented into non-overlapped short snippets, and action classification is performed independently on every short snippet. We argue that the MIL framework is suboptimal for PTAL because it operates on separated short snippets that contain limited temporal information. Therefore, the classifier only focuses on several easy-to-distinguish snippets instead of discovering the whole action instance without missing any relevant snippets. To alleviate this problem, we propose a novel method that localizes actions by generating and evaluating action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Diabetic Foot Ulcer Assessment and Management · Multimodal Machine Learning Applications
