Automatic Action Annotation in Weakly Labeled Videos
Waqas Sultani, Mubarak Shah

TL;DR
This paper introduces a weakly supervised method for automatically annotating human actions in videos, reducing manual effort and human bias by selecting representative proposals across multiple videos.
Contribution
It proposes a novel approach combining proposal ranking and a generalized maximum clique formulation to automatically generate spatio-temporal annotations in weakly labeled videos.
Findings
Achieved promising results on UCF Sport, sub-JHMDB, and THUMOS'13 datasets.
Classifiers trained on automatic annotations perform comparably to those trained on ground truth.
Effectively annotated multiple instances of actions within videos.
Abstract
Manual spatio-temporal annotation of human action in videos is laborious, requires several annotators and contains human biases. In this paper, we present a weakly supervised approach to automatically obtain spatio-temporal annotations of an actor in action videos. We first obtain a large number of action proposals in each video. To capture a few most representative action proposals in each video and evade processing thousands of them, we rank them using optical flow and saliency in a 3D-MRF based framework and select a few proposals using MAP based proposal subset selection method. We demonstrate that this ranking preserves the high quality action proposals. Several such proposals are generated for each video of the same action. Our next challenge is to iteratively select one proposal from each video so that all proposals are globally consistent. We formulate this as Generalized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
