SkeletonAgent: An Agentic Interaction Framework for Skeleton-based Action Recognition
Hongda Liu, Yunfan Liu, Changlu Wang, Yunlong Wang, Zhenan Sun

TL;DR
SkeletonAgent introduces a cooperative framework that enhances skeleton-based action recognition by integrating semantic priors from LLMs with feedback mechanisms, leading to improved accuracy across multiple benchmarks.
Contribution
The paper presents SkeletonAgent, a novel interactive framework that connects recognition models with LLMs through Questioner and Selector agents for better discriminative action recognition.
Findings
Outperforms state-of-the-art methods on five benchmarks.
Effectively utilizes LLM feedback for finer-grained recognition.
Demonstrates robustness across diverse datasets.
Abstract
Recent advances in skeleton-based action recognition increasingly leverage semantic priors from Large Language Models (LLMs) to enrich skeletal representations. However, the LLM is typically queried in isolation from the recognition model and receives no performance feedback. As a result, it often fails to deliver the targeted discriminative cues critical to distinguish similar actions. To overcome these limitations, we propose SkeletonAgent, a novel framework that bridges the recognition model and the LLM through two cooperative agents, i.e., Questioner and Selector. Specifically, the Questioner identifies the most frequently confused classes and supplies them to the LLM as context for more targeted guidance. Conversely, the Selector parses the LLM's response to extract precise joint-level constraints and feeds them back to the recognizer, enabling finer-grained cross-modal alignment.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Action Observation and Synchronization
