Mimetics: Towards Understanding Human Actions Out of Context
Philippe Weinzaepfel, Gr\'egory Rogez

TL;DR
This paper introduces Mimetics, a new dataset for evaluating human action recognition without contextual cues, revealing that current models struggle without context and that pose-based methods are more robust.
Contribution
The paper presents Mimetics, a novel dataset for out-of-context action recognition, and demonstrates the effectiveness of pose-based models over traditional 3D CNNs in this setting.
Findings
State-of-the-art 3D CNNs perform poorly on mimed actions without context.
Pose-based models are less biased by scene or object context.
A simple neural network on pose features performs competitively.
Abstract
Recent methods for video action recognition have reached outstanding performances on existing benchmarks. However, they tend to leverage context such as scenes or objects instead of focusing on understanding the human action itself. For instance, a tennis field leads to the prediction playing tennis irrespectively of the actions performed in the video. In contrast, humans have a more complete understanding of actions and can recognize them without context. The best example of out-of-context actions are mimes, that people can typically recognize despite missing relevant objects and scenes. In this paper, we propose to benchmark action recognition methods in such absence of context and introduce a novel dataset, Mimetics, consisting of mimed actions for a subset of 50 classes from the Kinetics benchmark. Our experiments show that (a) state-of-the-art 3D convolutional neural networks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution
