Exploring Enhanced Contextual Information for Video-Level Object   Tracking

Ben Kang; Xin Chen; Simiao Lai; Yang Liu; Yi Liu; Dong Wang

arXiv:2412.11023·cs.CV·December 17, 2024

Exploring Enhanced Contextual Information for Video-Level Object Tracking

Ben Kang, Xin Chen, Simiao Lai, Yang Liu, Yi Liu, Dong Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MCITrack, a novel video-level object tracking framework that uses extensive contextual information via Mamba's hidden states, significantly improving tracking robustness and achieving state-of-the-art results.

Contribution

It proposes a new framework, MCITrack, with a Contextual Information Fusion module that effectively captures and utilizes extensive video-level context for object tracking.

Findings

01

Achieves 76.6% AUC on LaSOT

02

Attains 80.0% AO on GOT-10k

03

Demonstrates superior performance over existing methods

Abstract

Contextual information at the video level has become increasingly crucial for visual object tracking. However, existing methods typically use only a few tokens to convey this information, which can lead to information loss and limit their ability to fully capture the context. To address this issue, we propose a new video-level visual object tracking framework called MCITrack. It leverages Mamba's hidden states to continuously record and transmit extensive contextual information throughout the video stream, resulting in more robust object tracking. The core component of MCITrack is the Contextual Information Fusion module, which consists of the mamba layer and the cross-attention layer. The mamba layer stores historical contextual information, while the cross-attention layer integrates this information into the current visual features of each backbone block. This module enhances the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kangben258/MCITrack
pytorchOfficial

Videos

Exploring Enhanced Contextual Information for Video-Level Object Tracking· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces · Artemisinin Optimization based on Malaria Therapy: Algorithm and Applications to Medical Image Segmentation