MV-GMN: State Space Model for Multi-View Action Recognition

Yuhui Lin; Jiaxuan Lu; Yue Yong; Jiahao Zhang

arXiv:2501.13829·cs.CV·January 24, 2025

MV-GMN: State Space Model for Multi-View Action Recognition

Yuhui Lin, Jiaxuan Lu, Yue Yong, Jiahao Zhang

PDF

Open Access

TL;DR

The paper introduces MV-GMN, a state-space model that efficiently combines multi-view, multi-modal, and temporal data for action recognition, outperforming Transformer-based models with lower computational complexity.

Contribution

It proposes a novel Multi-View Graph Mamba network with Bidirectional State Space Blocks and GCN modules, reducing computational costs while improving accuracy in multi-view action recognition.

Findings

01

Achieves 97.3% accuracy on NTU RGB+D 120 cross-subject

02

Outperforms Transformer-based baselines in accuracy

03

Requires only linear inference complexity

Abstract

Recent advancements in multi-view action recognition have largely relied on Transformer-based models. While effective and adaptable, these models often require substantial computational resources, especially in scenarios with multiple views and multiple temporal sequences. Addressing this limitation, this paper introduces the MV-GMN model, a state-space model specifically designed to efficiently aggregate multi-modal data (RGB and skeleton), multi-view perspectives, and multi-temporal information for action recognition with reduced computational complexity. The MV-GMN model employs an innovative Multi-View Graph Mamba network comprising a series of MV-GMN blocks. Each block includes a proposed Bidirectional State Space Block and a GCN module. The Bidirectional State Space Block introduces four scanning strategies, including view-prioritized and time-prioritized approaches. The GCN…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis

MethodsGraph Convolutional Network · Mamba: Linear-Time Sequence Modeling with Selective State Spaces