Part-aware Unified Representation of Language and Skeleton for Zero-shot   Action Recognition

Anqi Zhu; Qiuhong Ke; Mingming Gong; James Bailey

arXiv:2406.13327·cs.CV·June 21, 2024

Part-aware Unified Representation of Language and Skeleton for Zero-shot Action Recognition

Anqi Zhu, Qiuhong Ke, Mingming Gong, James Bailey

PDF

Open Access 1 Repo

TL;DR

This paper introduces PURLS, a novel method for zero-shot skeleton-based action recognition that aligns local and global visual features with language descriptions, significantly improving transferability to unseen classes.

Contribution

PURLS employs a new prompting and partitioning module to enhance visual-semantic alignment at multiple levels, advancing zero-shot action recognition.

Findings

01

Outperforms prior skeleton-based methods on multiple datasets.

02

Effective in transferring knowledge to unseen action classes.

03

Demonstrates universality across various backbones and datasets.

Abstract

While remarkable progress has been made on supervised skeleton-based action recognition, the challenge of zero-shot recognition remains relatively unexplored. In this paper, we argue that relying solely on aligning label-level semantics and global skeleton features is insufficient to effectively transfer locally consistent visual knowledge from seen to unseen classes. To address this limitation, we introduce Part-aware Unified Representation between Language and Skeleton (PURLS) to explore visual-semantic alignment at both local and global scales. PURLS introduces a new prompting module and a novel partitioning module to generate aligned textual and visual representations across different levels. The former leverages a pre-trained GPT-3 to infer refined descriptions of the global and local (body-part-based and temporal-interval-based) movements from the original action labels. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

azzh1/purls
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Multimodal Machine Learning Applications

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Linear Layer · Cosine Annealing · Multi-Head Attention · Residual Connection · Softmax · Layer Normalization · Byte Pair Encoding · Attention Dropout