SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition

Hongda Liu; Yunfan Liu; Changlu Wang; Yunlong Wang; Zhenan Sun

arXiv:2511.22433·cs.CV·March 13, 2026

SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition

Hongda Liu, Yunfan Liu, Changlu Wang, Yunlong Wang, Zhenan Sun

PDF

Open Access 1 Models

TL;DR

SkeletonAgent introduces a cooperative framework that enhances skeleton-based action recognition by integrating semantic priors from LLMs with feedback mechanisms, leading to improved accuracy across multiple benchmarks.

Contribution

The paper presents SkeletonAgent, a novel interactive framework that connects recognition models with LLMs through Questioner and Selector agents for better discriminative action recognition.

Findings

01

Outperforms state-of-the-art methods on five benchmarks.

02

Effectively utilizes LLM feedback for finer-grained recognition.

03

Demonstrates robustness across diverse datasets.

Abstract

Recent advances in skeleton-based action recognition increasingly leverage semantic priors from Large Language Models (LLMs) to enrich skeletal representations. However, the LLM is typically queried in isolation from the recognition model and receives no performance feedback. As a result, it often fails to deliver the targeted discriminative cues critical to distinguish similar actions. To overcome these limitations, we propose SkeletonAgent, a novel framework that bridges the recognition model and the LLM through two cooperative agents, i.e., Questioner and Selector. Specifically, the Questioner identifies the most frequently confused classes and supplies them to the LLM as context for more targeted guidance. Conversely, the Selector parses the LLM's response to extract precise joint-level constraints and feeds them back to the recognizer, enabling finer-grained cross-modal alignment.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
firework8/SkeletonAgent
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Action Observation and Synchronization