CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a   Cross-level Manner

Tingbing Yan; Wenzheng Zeng; Yang Xiao; Xingyu Tong; Bo Tan; Zhiwen; Fang; Zhiguo Cao; Joey Tianyi Zhou

arXiv:2403.10082·cs.CV·March 18, 2024·1 cites

CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner

Tingbing Yan, Wenzheng Zeng, Yang Xiao, Xingyu Tong, Bo Tan, Zhiwen, Fang, Zhiguo Cao, Joey Tianyi Zhou

PDF

Open Access

TL;DR

CrossGLG introduces a novel approach that leverages high-level text descriptions from large language models to guide skeleton-based 3D action recognition, improving accuracy and generalization in a cross-level manner.

Contribution

The paper proposes a global-local-guided framework using LLM-generated text to enhance feature learning in one-shot skeleton-based action recognition, with a dual-branch architecture for efficient inference.

Findings

01

Outperforms state-of-the-art methods on three benchmarks.

02

Achieves significant accuracy improvements with minimal inference cost.

03

Can be integrated with existing skeleton encoders to boost performance.

Abstract

Most existing one-shot skeleton-based action recognition focuses on raw low-level information (e.g., joint location), and may suffer from local information loss and low generalization ability. To alleviate these, we propose to leverage text description generated from large language models (LLM) that contain high-level human knowledge, to guide feature learning, in a global-local-global way. Particularly, during training, we design $2$ prompts to gain global and local text descriptions of each action from an LLM. We first utilize the global text description to guide the skeleton encoder focus on informative joints (i.e.,global-to-local). Then we build non-local interaction between local text and joint features, to form the final global representation (i.e., local-to-global). To mitigate the asymmetry issue between the training and inference phases, we further design a dual-branch…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsFocus · Balanced Selection