Transductive Zero-Shot Action Recognition by Word-Vector Embedding
Xun Xu, Timothy Hospedales, Shaogang Gong

TL;DR
This paper proposes a transductive zero-shot action recognition method using word-vector embeddings to map videos and categories into a shared semantic space, addressing the challenge of recognizing unseen actions without training data.
Contribution
It introduces a novel approach that leverages word-vectors for zero-shot action recognition and explores transductive strategies to improve generalization across categories.
Findings
Enhanced recognition accuracy over baseline methods
Effective handling of domain shift in zero-shot learning
Demonstrated applicability to complex video datasets
Abstract
The number of categories for action recognition is growing rapidly and it has become increasingly hard to label sufficient training data for learning conventional models for all categories. Instead of collecting ever more data and labelling them exhaustively for all categories, an attractive alternative approach is zero-shot learning" (ZSL). To that end, in this study we construct a mapping between visual features and a semantic descriptor of each action category, allowing new categories to be recognised in the absence of any visual training data. Existing ZSL studies focus primarily on still images, and attribute-based semantic representations. In this work, we explore word-vectors as the shared semantic space to embed videos and category labels for ZSL action recognition. This is a more challenging problem than existing ZSL of still images and/or attributes, because the mapping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Human Pose and Action Recognition · Multimodal Machine Learning Applications
