Weakly Supervised Action Selection Learning in Video

Junwei Ma; Satya Krishna Gorti; Maksims Volkovs; Guangwei Yu

arXiv:2105.02439·cs.CV·May 7, 2021·1 cites

Weakly Supervised Action Selection Learning in Video

Junwei Ma, Satya Krishna Gorti, Maksims Volkovs, Guangwei Yu

PDF

Open Access 1 Repo

TL;DR

This paper introduces Action Selection Learning (ASL), a novel weakly supervised approach that improves video action localization by capturing the concept of 'actionness' and outperforms existing methods on popular benchmarks.

Contribution

The paper proposes ASL, a class-agnostic training method that enhances weakly supervised action localization by modeling 'actionness', reducing class bias in frame selection.

Findings

01

ASL outperforms baselines on THUMOS-14 and ActivityNet-1.2.

02

ASL achieves 10.3% and 5.7% relative improvements.

03

Actionness is crucial for effective weakly supervised localization.

Abstract

Localizing actions in video is a core task in computer vision. The weakly supervised temporal localization problem investigates whether this task can be adequately solved with only video-level labels, significantly reducing the amount of expensive and error-prone annotation that is required. A common approach is to train a frame-level classifier where frames with the highest class probability are selected to make a video-level prediction. Frame level activations are then used for localization. However, the absence of frame-level annotations cause the classifier to impart class bias on every frame. To address this, we propose the Action Selection Learning (ASL) approach to capture the general concept of action, a property we refer to as "actionness". Under ASL, the model is trained with a novel class-agnostic task to predict which frames will be selected by the classifier. Empirically,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

layer6ai-labs/ASL
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning