Multi-Label Zero-Shot Human Action Recognition via Joint Latent Ranking Embedding
Qian Wang, Ke Chen

TL;DR
This paper introduces a holistic framework for multi-label zero-shot human action recognition, addressing challenges of unknown temporal boundaries and leveraging semantic relationships, leading to improved recognition performance.
Contribution
It proposes a joint latent ranking embedding model with a novel neural architecture and learning algorithm for multi-label zero-shot human action recognition.
Findings
Effective on Breakfast and Charades datasets
Outperforms existing methods in zero-shot recognition
Introduces a new data split scheme for evaluation
Abstract
Human action recognition refers to automatic recognizing human actions from a video clip. In reality, there often exist multiple human actions in a video stream. Such a video stream is often weakly-annotated with a set of relevant human action labels at a global level rather than assigning each label to a specific video episode corresponding to a single action, which leads to a multi-label learning problem. Furthermore, there are many meaningful human actions in reality but it would be extremely difficult to collect/annotate video clips regarding all of various human actions, which leads to a zero-shot learning scenario. To the best of our knowledge, there is no work that has addressed all the above issues together in human action recognition. In this paper, we formulate a real-world human action recognition task as a multi-label zero-shot learning problem and propose a framework to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications
