Multi-Label Zero-Shot Human Action Recognition via Joint Latent Ranking   Embedding

Qian Wang; Ke Chen

arXiv:1709.05107·cs.CV·February 7, 2019·1 cites

Multi-Label Zero-Shot Human Action Recognition via Joint Latent Ranking Embedding

Qian Wang, Ke Chen

PDF

Open Access

TL;DR

This paper introduces a holistic framework for multi-label zero-shot human action recognition, addressing challenges of unknown temporal boundaries and leveraging semantic relationships, leading to improved recognition performance.

Contribution

It proposes a joint latent ranking embedding model with a novel neural architecture and learning algorithm for multi-label zero-shot human action recognition.

Findings

01

Effective on Breakfast and Charades datasets

02

Outperforms existing methods in zero-shot recognition

03

Introduces a new data split scheme for evaluation

Abstract

Human action recognition refers to automatic recognizing human actions from a video clip. In reality, there often exist multiple human actions in a video stream. Such a video stream is often weakly-annotated with a set of relevant human action labels at a global level rather than assigning each label to a specific video episode corresponding to a single action, which leads to a multi-label learning problem. Furthermore, there are many meaningful human actions in reality but it would be extremely difficult to collect/annotate video clips regarding all of various human actions, which leads to a zero-shot learning scenario. To the best of our knowledge, there is no work that has addressed all the above issues together in human action recognition. In this paper, we formulate a real-world human action recognition task as a multi-label zero-shot learning problem and propose a framework to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications