Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking

Shiao Wang; Ju Huang; Qingchuan Ma; Jinfeng Gao; Chunyi Xu; Xiao Wang; Lan Chen; Bo Jiang

arXiv:2506.23783·cs.CV·July 1, 2025

Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking

Shiao Wang, Ju Huang, Qingchuan Ma, Jinfeng Gao, Chunyi Xu, Xiao Wang, Lan Chen, Bo Jiang

PDF

Open Access 1 Repo

TL;DR

This paper introduces Mamba-FETrack V2, an efficient RGB-Event object tracking framework that leverages a linear-complexity Vision Mamba network for effective cross-modal feature fusion, achieving high performance with reduced computational cost.

Contribution

The paper presents a novel lightweight Prompt Generator and a Vision Mamba-based backbone for multimodal tracking, improving efficiency and cross-modal interaction over existing transformer-based methods.

Findings

01

Superior tracking accuracy on multiple benchmarks

02

Reduced computational complexity compared to transformer-based models

03

Effective multimodal feature fusion demonstrated

Abstract

Combining traditional RGB cameras with bio-inspired event cameras for robust object tracking has garnered increasing attention in recent years. However, most existing multimodal tracking algorithms depend heavily on high-complexity Vision Transformer architectures for feature extraction and fusion across modalities. This not only leads to substantial computational overhead but also limits the effectiveness of cross-modal interactions. In this paper, we propose an efficient RGB-Event object tracking framework based on the linear-complexity Vision Mamba network, termed Mamba-FETrack V2. Specifically, we first design a lightweight Prompt Generator that utilizes embedded features from each modality, together with a shared prompt pool, to dynamically generate modality-specific learnable prompt vectors. These prompts, along with the modality-specific embedded features, are then fed into a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

event-ahu/mamba_fetrack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Memory and Neural Computing · Advanced Neural Network Applications · Age of Information Optimization

MethodsDropout · Vision Transformer · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer