SwiTrack: Tri-State Switch for Cross-Modal Object Tracking

Boyue Xu; Ruichao Hou; Tongwei Ren; Dongming Zhou; Gangshan Wu; Jinde Cao

arXiv:2511.16227·cs.CV·November 21, 2025

SwiTrack: Tri-State Switch for Cross-Modal Object Tracking

Boyue Xu, Ruichao Hou, Tongwei Ren, Dongming Zhou, Gangshan Wu, Jinde Cao

PDF

Open Access

TL;DR

SwiTrack introduces a tri-stream framework for cross-modal object tracking that enhances feature robustness and reduces drift by using specialized streams, a consistency module, and dynamic template updates, achieving state-of-the-art results.

Contribution

The paper presents a novel tri-stream switch framework with modality-specific processing, a consistency trajectory prediction, and dynamic template reconstruction for improved CMOT performance.

Findings

01

Achieves 7.2% higher precision rate

02

Boosts success rate by 4.3%

03

Operates at 65 FPS in real-time

Abstract

Cross-modal object tracking (CMOT) is an emerging task that maintains target consistency while the video stream switches between different modalities, with only one modality available in each frame, mostly focusing on RGB-Near Infrared (RGB-NIR) tracking. Existing methods typically connect parallel RGB and NIR branches to a shared backbone, which limits the comprehensive extraction of distinctive modality-specific features and fails to address the issue of object drift, especially in the presence of unreliable inputs. In this paper, we propose SwiTrack, a novel state-switching framework that redefines CMOT through the deployment of three specialized streams. Specifically, RGB frames are processed by the visual encoder, while NIR frames undergo refinement via a NIR gated adapter coupled with the visual encoder to progressively calibrate shared latent space features, thereby yielding more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Advanced Technologies in Various Fields