Mamba-FETrack: Frame-Event Tracking via State Space Model
Ju Huang, Shiao Wang, Shuai Wang, Zhe Wu, Xiao Wang, Bo Jiang

TL;DR
Mamba-FETrack introduces a novel, efficient RGB-Event tracking framework based on State Space Models, significantly reducing computational costs while maintaining high accuracy in multi-modal object tracking.
Contribution
This work presents a new Mamba-based framework utilizing State Space Models for RGB-Event tracking, improving efficiency and interactive learning over existing Transformer-based methods.
Findings
Achieves higher SR/PR metrics than ViT-S based trackers.
Reduces GPU memory consumption by about 9.5%.
Decreases FLOPs and parameters by over 94%.
Abstract
RGB-Event based tracking is an emerging research topic, focusing on how to effectively integrate heterogeneous multi-modal data (synchronized exposure video frames and asynchronous pulse Event stream). Existing works typically employ Transformer based networks to handle these modalities and achieve decent accuracy through input-level or feature-level fusion on multiple datasets. However, these trackers require significant memory consumption and computational complexity due to the use of self-attention mechanism. This paper proposes a novel RGB-Event tracking framework, Mamba-FETrack, based on the State Space Model (SSM) to achieve high-performance tracking while effectively reducing computational costs and realizing more efficient tracking. Specifically, we adopt two modality-specific Mamba backbone networks to extract the features of RGB frames and Event streams. Then, we also propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsAttention Is All You Need · Dropout · Residual Connection · Softmax · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Linear Layer · Dense Connections · Label Smoothing
