Action Shuffling for Weakly Supervised Temporal Localization
Xiao-Yu Zhang, Haichao Shi, Changsheng Li, Xinchu Shi

TL;DR
This paper introduces ActShufNet, a novel two-branch network with action shuffling techniques and adversarial training to improve weakly supervised temporal action localization using only video-level labels.
Contribution
It proposes a self-augmented learning framework with intra- and inter-action shuffling, enhancing video representation and training set diversity without external data.
Findings
Significant performance improvements on benchmark datasets.
Effective use of action shuffling for weak supervision.
Enhanced robustness through adversarial training.
Abstract
Weakly supervised action localization is a challenging task with extensive applications, which aims to identify actions and the corresponding temporal intervals with only video-level annotations available. This paper analyzes the order-sensitive and location-insensitive properties of actions, and embodies them into a self-augmented learning framework to improve the weakly supervised action localization performance. To be specific, we propose a novel two-branch network architecture with intra/inter-action shuffling, referred to as ActShufNet. The intra-action shuffling branch lays out a self-supervised order prediction task to augment the video representation with inner-video relevance, whereas the inter-action shuffling branch imposes a reorganizing strategy on the existing action contents to augment the training set without resorting to any external resources. Furthermore, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
