Self-supervised Learning for Unintentional Action Prediction
Olga Zatsarynna, Yazan Abu Farha, Juergen Gall

TL;DR
This paper proposes a self-supervised learning approach that leverages global video context to improve the prediction, classification, and localization of unintentional or failed actions in videos, addressing annotation challenges.
Contribution
It introduces a novel self-supervised method that utilizes global video context, outperforming local neighborhood approaches for unintentional action understanding.
Findings
Effective for unintentional action classification
Improves localization and anticipation accuracy
Can be applied to anomaly detection in videos
Abstract
Distinguishing if an action is performed as intended or if an intended action fails is an important skill that not only humans have, but that is also important for intelligent systems that operate in human environments. Recognizing if an action is unintentional or anticipating if an action will fail, however, is not straightforward due to lack of annotated data. While videos of unintentional or failed actions can be found in the Internet in abundance, high annotation costs are a major bottleneck for learning networks for these tasks. In this work, we thus study the problem of self-supervised representation learning for unintentional action prediction. While previous works learn the representation based on a local temporal neighborhood, we show that the global context of a video is needed to learn a good representation for the three downstream tasks: unintentional action classification,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Human Pose and Action Recognition · Gait Recognition and Analysis
