LLMs are Good Action Recognizers
Haoxuan Qu, Yujun Cai, Jun Liu

TL;DR
This paper introduces LLM-AR, a novel framework that leverages large language models for skeleton-based action recognition by converting action signals into sentence formats, demonstrating promising results.
Contribution
The paper proposes a new approach using large language models as action recognizers through linguistic projection of skeleton data into sentence format.
Findings
Effective action recognition via LLMs demonstrated
Linguistic projection enhances recognition accuracy
Framework shows promising experimental results
Abstract
Skeleton-based action recognition has attracted lots of research attention. Recently, to build an accurate skeleton-based action recognizer, a variety of works have been proposed. Among them, some works use large model architectures as backbones of their recognizers to boost the skeleton data representation capability, while some other works pre-train their recognizers on external data to enrich the knowledge. In this work, we observe that large language models which have been extensively used in various natural language processing tasks generally hold both large model architectures and rich implicit knowledge. Motivated by this, we propose a novel LLM-AR framework, in which we investigate treating the Large Language Model as an Action Recognizer. In our framework, we propose a linguistic projection process to project each input action signal (i.e., each skeleton sequence) into its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Artificial Intelligence in Law · Artificial Intelligence in Healthcare
