Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed   Videos

Anurag Arnab; Chen Sun; Arsha Nagrani; Cordelia Schmid

arXiv:2007.10703·cs.CV·July 22, 2020

Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos

Anurag Arnab, Chen Sun, Arsha Nagrani, Cordelia Schmid

PDF

TL;DR

This paper introduces a weakly supervised spatio-temporal action detection model that uses only video-level labels and incorporates uncertainty estimation, achieving state-of-the-art results on benchmark datasets.

Contribution

It presents a novel probabilistic MIL framework with uncertainty estimation for weakly supervised action detection in untrimmed videos.

Findings

01

First weakly-supervised results on AVA dataset

02

State-of-the-art weakly-supervised results on UCF101-24

03

Effective handling of MIL assumption violations

Abstract

Despite the recent advances in video classification, progress in spatio-temporal action recognition has lagged behind. A major contributing factor has been the prohibitive cost of annotating videos frame-by-frame. In this paper, we present a spatio-temporal action recognition model that is trained with only video-level labels, which are significantly easier to annotate. Our method leverages per-frame person detectors which have been trained on large image datasets within a Multiple Instance Learning framework. We show how we can apply our method in cases where the standard Multiple Instance Learning assumption, that each bag contains at least one instance with the specified label, is invalid using a novel probabilistic variant of MIL where we estimate the uncertainty of each prediction. Furthermore, we report the first weakly-supervised results on the AVA dataset and state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.