Exploring Enhanced Contextual Information for Video-Level Object Tracking
Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang

TL;DR
This paper introduces MCITrack, a novel video-level object tracking framework that uses extensive contextual information via Mamba's hidden states, significantly improving tracking robustness and achieving state-of-the-art results.
Contribution
It proposes a new framework, MCITrack, with a Contextual Information Fusion module that effectively captures and utilizes extensive video-level context for object tracking.
Findings
Achieves 76.6% AUC on LaSOT
Attains 80.0% AO on GOT-10k
Demonstrates superior performance over existing methods
Abstract
Contextual information at the video level has become increasingly crucial for visual object tracking. However, existing methods typically use only a few tokens to convey this information, which can lead to information loss and limit their ability to fully capture the context. To address this issue, we propose a new video-level visual object tracking framework called MCITrack. It leverages Mamba's hidden states to continuously record and transmit extensive contextual information throughout the video stream, resulting in more robust object tracking. The core component of MCITrack is the Contextual Information Fusion module, which consists of the mamba layer and the cross-attention layer. The mamba layer stores historical contextual information, while the cross-attention layer integrates this information into the current visual features of each backbone block. This module enhances the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Artemisinin Optimization based on Malaria Therapy: Algorithm and Applications to Medical Image Segmentation
