Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking
Shiao Wang, Ju Huang, Qingchuan Ma, Jinfeng Gao, Chunyi Xu, Xiao Wang, Lan Chen, Bo Jiang

TL;DR
This paper introduces Mamba-FETrack V2, an efficient RGB-Event object tracking framework that leverages a linear-complexity Vision Mamba network for effective cross-modal feature fusion, achieving high performance with reduced computational cost.
Contribution
The paper presents a novel lightweight Prompt Generator and a Vision Mamba-based backbone for multimodal tracking, improving efficiency and cross-modal interaction over existing transformer-based methods.
Findings
Superior tracking accuracy on multiple benchmarks
Reduced computational complexity compared to transformer-based models
Effective multimodal feature fusion demonstrated
Abstract
Combining traditional RGB cameras with bio-inspired event cameras for robust object tracking has garnered increasing attention in recent years. However, most existing multimodal tracking algorithms depend heavily on high-complexity Vision Transformer architectures for feature extraction and fusion across modalities. This not only leads to substantial computational overhead but also limits the effectiveness of cross-modal interactions. In this paper, we propose an efficient RGB-Event object tracking framework based on the linear-complexity Vision Mamba network, termed Mamba-FETrack V2. Specifically, we first design a lightweight Prompt Generator that utilizes embedded features from each modality, together with a shared prompt pool, to dynamically generate modality-specific learnable prompt vectors. These prompts, along with the modality-specific embedded features, are then fed into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Age of Information Optimization
MethodsDropout · Vision Transformer · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer
