Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition
Jinfu Liu, Chen Chen, Mengyuan Liu

TL;DR
This paper introduces a multi-modality co-learning framework that leverages multimodal large language models to improve skeleton-based action recognition, achieving high accuracy while maintaining efficiency during inference.
Contribution
The novel MMCL framework integrates multimodal LLMs with skeleton recognition, enabling effective feature alignment and refinement during training, with efficient inference using only skeleton data.
Findings
Outperforms existing skeleton-based methods on benchmark datasets.
Demonstrates strong zero-shot and domain adaptation capabilities.
Effectively aligns RGB and skeleton features via contrastive learning.
Abstract
Skeleton-based action recognition has garnered significant attention due to the utilization of concise and resilient skeletons. Nevertheless, the absence of detailed body information in skeletons restricts performance, while other multimodal methods require substantial inference resources and are inefficient when using multimodal data during both training and inference stages. To address this and fully harness the complementary multimodal features, we propose a novel multi-modality co-learning (MMCL) framework by leveraging the multimodal large language models (LLMs) as auxiliary networks for efficient skeleton-based action recognition, which engages in multi-modality co-learning during the training stage and keeps efficiency by employing only concise skeletons in inference. Our MMCL framework primarily consists of two modules. First, the Feature Alignment Module (FAM) extracts rich RGB…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsSoftmax · Attention Is All You Need
