Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation

Jingmin Zhu; Anqi Zhu; Hossein Rahmani; Jun Liu; Mohammed Bennamoun; Qiuhong Ke

arXiv:2512.11458·cs.CV·December 15, 2025

Boosting Skeleton-based Zero-Shot Action Recognition with Training-Free Test-Time Adaptation

Jingmin Zhu, Anqi Zhu, Hossein Rahmani, Jun Liu, Mohammed Bennamoun, Qiuhong Ke

PDF

Open Access

TL;DR

Skeleton-Cache is a novel training-free, test-time adaptation framework for skeleton-based zero-shot action recognition that leverages a non-parametric cache and large language models to improve generalization to unseen actions.

Contribution

It introduces Skeleton-Cache, the first training-free test-time adaptation method for SZAR, combining structured skeleton descriptors with LLM-guided semantic priors for better unseen action recognition.

Findings

01

Consistently improves SZAR performance on NTU RGB+D and PKU-MMD datasets.

02

Effective in both zero-shot and generalized zero-shot settings.

03

No additional training or data required for adaptation.

Abstract

We introduce Skeleton-Cache, the first training-free test-time adaptation framework for skeleton-based zero-shot action recognition (SZAR), aimed at improving model generalization to unseen actions during inference. Skeleton-Cache reformulates inference as a lightweight retrieval process over a non-parametric cache that stores structured skeleton representations, combining both global and fine-grained local descriptors. To guide the fusion of descriptor-wise predictions, we leverage the semantic reasoning capabilities of large language models (LLMs) to assign class-specific importance weights. By integrating these structured descriptors with LLM-guided semantic priors, Skeleton-Cache dynamically adapts to unseen actions without any additional training or access to training data. Extensive experiments on NTU RGB+D 60/120 and PKU-MMD II demonstrate that Skeleton-Cache consistently boosts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Multimodal Machine Learning Applications