Is Appearance Free Action Recognition Possible?
Filip Ilic, Thomas Pock, Richard P. Wildes

TL;DR
This paper introduces the Appearance Free Dataset (AFD) to evaluate whether modern action recognition models rely on static appearance or dynamic motion, revealing models struggle without static cues, unlike humans.
Contribution
The paper presents AFD, a dataset isolating dynamic information for action recognition, and evaluates existing models, highlighting their reliance on static cues and proposing a new architecture emphasizing motion recovery.
Findings
All architectures perform worse on AFD than on RGB videos.
Humans recognize actions on AFD nearly as well as on RGB, outperforming models.
Explicit optical flow recovery improves performance on AFD and RGB.
Abstract
Intuition might suggest that motion and dynamic information are key to video-based action recognition. In contrast, there is evidence that state-of-the-art deep-learning video understanding architectures are biased toward static information available in single frames. Presently, a methodology and corresponding dataset to isolate the effects of dynamic information in video are missing. Their absence makes it difficult to understand how well contemporary architectures capitalize on dynamic vs. static information. We respond with a novel Appearance Free Dataset (AFD) for action recognition. AFD is devoid of static information relevant to action recognition in a single frame. Modeling of the dynamics is necessary for solving the task, as the action is only apparent through consideration of the temporal dimension. We evaluated 11 contemporary action recognition architectures on AFD as well…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Advanced Vision and Imaging
