MV-GMN: State Space Model for Multi-View Action Recognition
Yuhui Lin, Jiaxuan Lu, Yue Yong, Jiahao Zhang

TL;DR
The paper introduces MV-GMN, a state-space model that efficiently combines multi-view, multi-modal, and temporal data for action recognition, outperforming Transformer-based models with lower computational complexity.
Contribution
It proposes a novel Multi-View Graph Mamba network with Bidirectional State Space Blocks and GCN modules, reducing computational costs while improving accuracy in multi-view action recognition.
Findings
Achieves 97.3% accuracy on NTU RGB+D 120 cross-subject
Outperforms Transformer-based baselines in accuracy
Requires only linear inference complexity
Abstract
Recent advancements in multi-view action recognition have largely relied on Transformer-based models. While effective and adaptable, these models often require substantial computational resources, especially in scenarios with multiple views and multiple temporal sequences. Addressing this limitation, this paper introduces the MV-GMN model, a state-space model specifically designed to efficiently aggregate multi-modal data (RGB and skeleton), multi-view perspectives, and multi-temporal information for action recognition with reduced computational complexity. The MV-GMN model employs an innovative Multi-View Graph Mamba network comprising a series of MV-GMN blocks. Each block includes a proposed Bidirectional State Space Block and a GCN module. The Bidirectional State Space Block introduces four scanning strategies, including view-prioritized and time-prioritized approaches. The GCN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsGraph Convolutional Network · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
