Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-resolution Information in Temporal Domain

Rui Su; Dong Xu; Luping Zhou; Wanli Ouyang

arXiv:2506.18261·cs.CV·June 24, 2025

Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-resolution Information in Temporal Domain

Rui Su, Dong Xu, Luping Zhou, Wanli Ouyang

PDF

TL;DR

This paper introduces a two-stage method that leverages multi-resolution temporal information and iterative pseudo label refinement to improve weakly supervised temporal action localization.

Contribution

It proposes a novel multi-resolution framework with initial label generation and progressive refinement, enhancing pseudo label quality for better action localization.

Findings

01

Improved localization accuracy over existing methods

02

Effective use of multi-resolution temporal information

03

Enhanced pseudo label quality through iterative refinement

Abstract

Weakly supervised temporal action localization is a challenging task as only the video-level annotation is available during the training process. To address this problem, we propose a two-stage approach to fully exploit multi-resolution information in the temporal domain and generate high quality frame-level pseudo labels based on both appearance and motion streams. Specifically, in the first stage, we generate reliable initial frame-level pseudo labels, and in the second stage, we iteratively refine the pseudo labels and use a set of selected frames with highly confident pseudo labels to train neural networks and better predict action class scores at each frame. We fully exploit temporal information at multiple scales to improve temporal action localization performance. Specifically, in order to obtain reliable initial frame-level pseudo labels, in the first stage, we propose an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.