RGBT Tracking via All-layer Multimodal Interactions with Progressive   Fusion Mamba

Andong Lu; Wanyu Wang; Chenglong Li; Jin Tang; and Bin Luo

arXiv:2408.08827·cs.CV·December 31, 2024

RGBT Tracking via All-layer Multimodal Interactions with Progressive Fusion Mamba

Andong Lu, Wanyu Wang, Chenglong Li, Jin Tang, and Bin Luo

PDF

Open Access

TL;DR

This paper introduces AINet, a novel RGBT tracking network that performs efficient, all-layer multimodal interactions using progressive fusion, significantly improving robustness and performance in multimodal tracking tasks.

Contribution

The paper proposes a new All-layer multimodal Interaction Network with a Difference-based Fusion Mamba and Order-dynamic Fusion Mamba for efficient, comprehensive feature interaction across all layers.

Findings

01

Achieves state-of-the-art performance on four RGBT tracking datasets.

02

Effectively balances interaction capability and computational efficiency.

03

Demonstrates robustness and accuracy improvements over existing methods.

Abstract

Existing RGBT tracking methods often design various interaction models to perform cross-modal fusion of each layer, but can not execute the feature interactions among all layers, which plays a critical role in robust multimodal representation, due to large computational burden. To address this issue, this paper presents a novel All-layer multimodal Interaction Network, named AINet, which performs efficient and effective feature interactions of all modalities and layers in a progressive fusion Mamba, for robust RGBT tracking. Even though modality features in different layers are known to contain different cues, it is always challenging to build multimodal interactions in each layer due to struggling in balancing interaction capabilities and efficiency. Meanwhile, considering that the feature discrepancy between RGB and thermal modalities reflects their complementary information to some…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Face recognition and analysis

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces