Multi-Modality Co-Learning for Efficient Skeleton-based Action   Recognition

Jinfu Liu; Chen Chen; Mengyuan Liu

arXiv:2407.15706·cs.CV·August 16, 2024

Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition

Jinfu Liu, Chen Chen, Mengyuan Liu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-modality co-learning framework that leverages multimodal large language models to improve skeleton-based action recognition, achieving high accuracy while maintaining efficiency during inference.

Contribution

The novel MMCL framework integrates multimodal LLMs with skeleton recognition, enabling effective feature alignment and refinement during training, with efficient inference using only skeleton data.

Findings

01

Outperforms existing skeleton-based methods on benchmark datasets.

02

Demonstrates strong zero-shot and domain adaptation capabilities.

03

Effectively aligns RGB and skeleton features via contrastive learning.

Abstract

Skeleton-based action recognition has garnered significant attention due to the utilization of concise and resilient skeletons. Nevertheless, the absence of detailed body information in skeletons restricts performance, while other multimodal methods require substantial inference resources and are inefficient when using multimodal data during both training and inference stages. To address this and fully harness the complementary multimodal features, we propose a novel multi-modality co-learning (MMCL) framework by leveraging the multimodal large language models (LLMs) as auxiliary networks for efficient skeleton-based action recognition, which engages in multi-modality co-learning during the training stage and keeps efficiency by employing only concise skeletons in inference. Our MMCL framework primarily consists of two modules. First, the Feature Alignment Module (FAM) extracts rich RGB…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liujf69/MMCL-Action
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsSoftmax · Attention Is All You Need