Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition
Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

TL;DR
This paper introduces PURLS, a novel method for zero-shot skeleton-based action recognition that aligns local and global visual features with language descriptions, significantly improving transferability to unseen classes.
Contribution
PURLS employs a new prompting and partitioning module to enhance visual-semantic alignment at multiple levels, advancing zero-shot action recognition.
Findings
Outperforms prior skeleton-based methods on multiple datasets.
Effective in transferring knowledge to unseen action classes.
Demonstrates universality across various backbones and datasets.
Abstract
While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to generate aligned textual and visual representations across different levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Cosine Annealing · Multi-Head Attention · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout
