SkeletonContext: Skeleton-side Context Prompt Learning for Zero-Shot Skeleton-based Action Recognition

Ning Wang; Tieyue Wu; Naeha Sharif; Farid Boussaid; Guangming Zhu; Lin Mei; Mohammed Bennamoun; zhang liang

arXiv:2603.29692·cs.CV·April 1, 2026

SkeletonContext: Skeleton-side Context Prompt Learning for Zero-Shot Skeleton-based Action Recognition

Ning Wang, Tieyue Wu, Naeha Sharif, Farid Boussaid, Guangming Zhu, Lin Mei, Mohammed Bennamoun, zhang liang

PDF

TL;DR

SkeletonContext introduces a prompt-based framework that incorporates language-driven contextual semantics into skeleton motion representations, significantly improving zero-shot skeleton-based action recognition performance.

Contribution

It proposes a novel Cross-Modal Context Prompt Module and Key-Part Decoupling Module to enhance semantic grounding and robustness in recognizing unseen actions.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Effectively incorporates contextual cues to distinguish similar actions.

03

Improves zero-shot recognition accuracy in skeleton-based action recognition.

Abstract

Zero-shot skeleton-based action recognition aims to recognize unseen actions by transferring knowledge from seen categories through semantic descriptions. Most existing methods typically align skeleton features with textual embeddings within a shared latent space. However, the absence of contextual cues, such as objects involved in the action, introduces an inherent gap between skeleton and semantic representations, making it difficult to distinguish visually similar actions. To address this, we propose SkeletonContext, a prompt-based framework that enriches skeletal motion representations with language-driven contextual semantics. Specifically, we introduce a Cross-Modal Context Prompt Module, which leverages a pretrained language model to reconstruct masked contextual prompts under guidance derived from LLMs. This design effectively transfers linguistic context to the skeleton encoder…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.