TL;DR
This paper introduces the novel task of win-fail action recognition to differentiate successful from failed attempts in videos, supported by a new dataset and analysis of current methods' limitations in achieving true action understanding.
Contribution
It presents the first paired win-fail action dataset across diverse domains and analyzes existing recognition models' performance and challenges in this new task.
Findings
Current methods perform well but leave a large gap for true understanding.
High intra-class variation makes the task challenging yet feasible.
The dataset enables systematic analysis of win-fail action recognition.
Abstract
Current video/action understanding systems have demonstrated impressive performance on large recognition tasks. However, they might be limiting themselves to learning to recognize spatiotemporal patterns, rather than attempting to thoroughly understand the actions. To spur progress in the direction of a truer, deeper understanding of videos, we introduce the task of win-fail action recognition -- differentiating between successful and failed attempts at various activities. We introduce a first of its kind paired win-fail action understanding dataset with samples from the following domains: "General Stunts," "Internet Wins-Fails," "Trick Shots," and "Party Games." Unlike existing action recognition datasets, intra-class variation is high making the task challenging, yet feasible. We systematically analyze the characteristics of the win-fail task/dataset with prototypical action…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Win-Fail Action Recognition· youtube
