Full-Stage Pseudo Label Quality Enhancement for Weakly-supervised Temporal Action Localization
Qianhan Feng, Wenshuo Li, Tong Lin, Xinghao Chen

TL;DR
This paper introduces FuSTAL, a framework that significantly improves pseudo label quality at multiple stages in weakly-supervised temporal action localization, leading to state-of-the-art results on THUMOS'14.
Contribution
The paper proposes a novel multi-stage pseudo label enhancement framework for WSTAL, improving label quality and localization accuracy.
Findings
Achieves 50.8% mAP on THUMOS'14, surpassing previous best by 1.2%.
Introduces cross-video contrastive learning, prior-based filtering, and EMA distillation.
First method to reach 50% mAP milestone in WSTAL.
Abstract
Weakly-supervised Temporal Action Localization (WSTAL) aims to localize actions in untrimmed videos using only video-level supervision. Latest WSTAL methods introduce pseudo label learning framework to bridge the gap between classification-based training and inferencing targets at localization, and achieve cutting-edge results. In these frameworks, a classification-based model is used to generate pseudo labels for a regression-based student model to learn from. However, the quality of pseudo labels in the framework, which is a key factor to the final result, is not carefully studied. In this paper, we propose a set of simple yet efficient pseudo label quality enhancement mechanisms to build our FuSTAL framework. FuSTAL enhances pseudo label quality at three stages: cross-video contrastive learning at proposal Generation-Stage, prior-based filtering at proposal Selection-Stage and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications
MethodsSparse Evolutionary Training · Contrastive Learning
