Manta: Enhancing Mamba for Few-Shot Action Recognition of Long Sub-Sequence
Wenbo Huang, Jinghui Zhang, Guang Li, Lei Zhang, Shuoyuan Wang, Fang Dong, Jiahui Jin, Takahiro Ogawa, Miki Haseyama

TL;DR
Manta enhances long sub-sequence few-shot action recognition by introducing local feature modeling and a hybrid contrastive learning framework, achieving state-of-the-art results on multiple benchmarks.
Contribution
The paper proposes Matryoshka Mamba with multiple Inner Modules and an Outer Module for improved local feature and temporal modeling, along with a hybrid contrastive learning paradigm, advancing FSAR performance.
Findings
Achieves new state-of-the-art on SSv2, Kinetics, UCF101, HMDB51
Significantly improves FSAR of long sub-sequences
Demonstrates effectiveness of local feature modeling and hybrid contrastive learning
Abstract
In few-shot action recognition (FSAR), long sub-sequences of video naturally express entire actions more effectively. However, the high computational complexity of mainstream Transformer-based methods limits their application. Recent Mamba demonstrates efficiency in modeling long sequences, but directly applying Mamba to FSAR overlooks the importance of local feature modeling and alignment. Moreover, long sub-sequences within the same class accumulate intra-class variance, which adversely impacts FSAR performance. To solve these challenges, we propose a Matryoshka MAmba and CoNtrasTive LeArning framework (Manta). Firstly, the Matryoshka Mamba introduces multiple Inner Modules to enhance local feature representation, rather than directly modeling global features. An Outer Module captures dependencies of timeline between these local features for implicit temporal alignment. Secondly, a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Gait Recognition and Analysis
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Contrastive Learning
