DynaPURLS: Dynamic Refinement of Part-aware Representations for Skeleton-based Zero-Shot Action Recognition
Jingmin Zhu, Anqi Zhu, James Bailey, Jun Liu, Hossein Rahmani, Mohammed Bennamoun, Farid Boussaid, and Qiuhong Ke

TL;DR
DynaPURLS introduces a dynamic, multi-scale approach for skeleton-based zero-shot action recognition, utilizing hierarchical textual descriptions and adaptive visual-semantic alignment to improve generalization to unseen classes.
Contribution
The paper proposes DynaPURLS, a novel framework that dynamically refines visual-semantic correspondences at inference time using large language models and adaptive partitioning, addressing domain shift in zero-shot recognition.
Findings
Achieves state-of-the-art results on NTU RGB+D 60/120 and PKU-MMD datasets.
Effectively mitigates domain shift through dynamic refinement and confidence-aware memory bank.
Significantly improves zero-shot recognition accuracy over prior methods.
Abstract
Zero-shot skeleton-based action recognition (ZS-SAR) is fundamentally constrained by prevailing approaches that rely on aligning skeleton features with static, class-level semantics. This coarse-grained alignment fails to bridge the domain shift between seen and unseen classes, thereby impeding the effective transfer of fine-grained visual knowledge. To address these limitations, we introduce \textbf{DynaPURLS}, a unified framework that establishes robust, multi-scale visual-semantic correspondences and dynamically refines them at inference time to enhance generalization. Our framework leverages a large language model to generate hierarchical textual descriptions that encompass both global movements and local body-part dynamics. Concurrently, an adaptive partitioning module produces fine-grained visual representations by semantically grouping skeleton joints. To fortify this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
