Rethinking Pseudo-Label Guided Learning for Weakly Supervised Temporal   Action Localization from the Perspective of Noise Correction

Quan Zhang; Yuxin Qi; Xi Tang; Rui Yuan; Xi Lin; Ke Zhang; Chun Yuan

arXiv:2501.11124·cs.CV·May 1, 2025

Rethinking Pseudo-Label Guided Learning for Weakly Supervised Temporal Action Localization from the Perspective of Noise Correction

Quan Zhang, Yuxin Qi, Xi Tang, Rui Yuan, Xi Lin, Ke Zhang, Chun Yuan

PDF

Open Access

TL;DR

This paper proposes a novel two-stage noise correction strategy for pseudo-label guided learning in weakly-supervised temporal action localization, significantly improving detection accuracy and speed.

Contribution

It introduces a context-aware denoising algorithm and an online-revised teacher-student framework to address noise issues in pseudo-labels, enhancing localization performance.

Findings

01

Outperforms previous state-of-the-art on THUMOS14 and ActivityNet v1.2

02

Achieves higher detection accuracy and faster inference speed

03

Effectively handles boundary inaccuracies and short action clips

Abstract

Pseudo-label learning methods have been widely applied in weakly-supervised temporal action localization. Existing works directly utilize weakly-supervised base model to generate instance-level pseudo-labels for training the fully-supervised detection head. We argue that the noise in pseudo-labels would interfere with the learning of fully-supervised detection head, leading to significant performance leakage. Issues with noisy labels include:(1) inaccurate boundary localization; (2) undetected short action clips; (3) multiple adjacent segments incorrectly detected as one segment. To target these issues, we introduce a two-stage noisy label learning strategy to harness every potential useful signal in noisy labels. First, we propose a frame-level pseudo-label generation model with a context-aware denoising algorithm to refine the boundaries. Second, we introduce an online-revised…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Balanced Selection