TL;DR
This paper introduces a novel weakly-supervised approach for temporal action localization that models background frames as out-of-distribution samples using uncertainty estimation, significantly improving performance on benchmark datasets.
Contribution
It proposes a new uncertainty modeling framework for weakly-supervised temporal action localization, incorporating a background entropy loss to better distinguish background from action frames.
Findings
Achieves state-of-the-art results on THUMOS'14 and ActivityNet datasets.
Effectively reduces background interference in action localization.
Demonstrates the benefit of uncertainty modeling in weakly-supervised learning.
Abstract
Weakly-supervised temporal action localization aims to learn detecting temporal intervals of action classes with only video-level labels. To this end, it is crucial to separate frames of action classes from the background frames (i.e., frames not belonging to any action classes). In this paper, we present a new perspective on background frames where they are modeled as out-of-distribution samples regarding their inconsistency. Then, background frames can be detected by estimating the probability of each frame being out-of-distribution, known as uncertainty, but it is infeasible to directly learn uncertainty without frame-level labels. To realize the uncertainty learning in the weakly-supervised setting, we leverage the multiple instance learning formulation. Moreover, we further introduce a background entropy loss to better discriminate background frames by encouraging their…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
