Objects2action: Classifying and localizing actions without any video   example

Mihir Jain; Jan C. van Gemert; Thomas Mensink; Cees G. M. Snoek

arXiv:1510.06939·cs.CV·October 26, 2015·35 cites

Objects2action: Classifying and localizing actions without any video example

Mihir Jain, Jan C. van Gemert, Thomas Mensink, Cees G. M. Snoek

PDF

Open Access

TL;DR

This paper introduces Objects2action, a zero-shot video action recognition method that uses a semantic word embedding based on object categories, enabling action classification and localization without prior video examples.

Contribution

The paper presents a novel semantic embedding for actions and objects that allows zero-shot recognition and localization of actions in videos without needing attribute classifiers or class-to-attribute mappings.

Findings

01

Effective zero-shot action recognition on four datasets.

02

Ability to localize actions in space and time without training examples.

03

Utilization of multi-word descriptions and object responsiveness enhances accuracy.

Abstract

The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Anomaly Detection Techniques and Applications