Improving Weakly Supervised Temporal Action Localization by Exploiting Multi-resolution Information in Temporal Domain
Rui Su, Dong Xu, Luping Zhou, Wanli Ouyang

TL;DR
This paper introduces a two-stage method that leverages multi-resolution temporal information and iterative pseudo label refinement to improve weakly supervised temporal action localization.
Contribution
It proposes a novel multi-resolution framework with initial label generation and progressive refinement, enhancing pseudo label quality for better action localization.
Findings
Improved localization accuracy over existing methods
Effective use of multi-resolution temporal information
Enhanced pseudo label quality through iterative refinement
Abstract
Weakly supervised temporal action localization is a challenging task as only the video-level annotation is available during the training process. To address this problem, we propose a two-stage approach to fully exploit multi-resolution information in the temporal domain and generate high quality frame-level pseudo labels based on both appearance and motion streams. Specifically, in the first stage, we generate reliable initial frame-level pseudo labels, and in the second stage, we iteratively refine the pseudo labels and use a set of selected frames with highly confident pseudo labels to train neural networks and better predict action class scores at each frame. We fully exploit temporal information at multiple scales to improve temporal action localization performance. Specifically, in order to obtain reliable initial frame-level pseudo labels, in the first stage, we propose an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
