Objects2action: Classifying and localizing actions without any video example
Mihir Jain, Jan C. van Gemert, Thomas Mensink, Cees G. M. Snoek

TL;DR
This paper introduces Objects2action, a zero-shot video action recognition method that uses a semantic word embedding based on object categories, enabling action classification and localization without prior video examples.
Contribution
The paper presents a novel semantic embedding for actions and objects that allows zero-shot recognition and localization of actions in videos without needing attribute classifiers or class-to-attribute mappings.
Findings
Effective zero-shot action recognition on four datasets.
Ability to localize actions in space and time without training examples.
Utilization of multi-word descriptions and object responsiveness enhances accuracy.
Abstract
The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications
