Multi-Semantic Fusion Model for Generalized Zero-Shot Skeleton-Based Action Recognition
Ming-Zhe Li, Zhen Jia, Zhang Zhang, Zhanyu Ma, and Liang Wang

TL;DR
This paper introduces a multi-semantic fusion model that leverages rich textual descriptions and skeleton features to improve generalized zero-shot skeleton-based action recognition, addressing the limitations of previous label-only approaches.
Contribution
The proposed MSF model integrates multiple semantic sources and a VAE-based alignment to enhance recognition of unseen actions in GZSSAR.
Findings
Outperforms previous models in GZSSAR tasks
Effectively utilizes action and motion descriptions for better generalization
Demonstrates robustness in recognizing unseen actions
Abstract
Generalized zero-shot skeleton-based action recognition (GZSSAR) is a new challenging problem in computer vision community, which requires models to recognize actions without any training samples. Previous studies only utilize the action labels of verb phrases as the semantic prototypes for learning the mapping from skeleton-based actions to a shared semantic space. However, the limited semantic information of action labels restricts the generalization ability of skeleton features for recognizing unseen actions. In order to solve this dilemma, we propose a multi-semantic fusion (MSF) model for improving the performance of GZSSAR, where two kinds of class-level textual descriptions (i.e., action descriptions and motion descriptions), are collected as auxiliary semantic information to enhance the learning efficacy of generalizable skeleton features. Specially, a pre-trained language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Hand Gesture Recognition Systems
