UniSOT: A Unified Framework for Multi-Modality Single Object Tracking

Yinchao Ma; Yuyang Tang; Wenfei Yang; Tianzhu Zhang; Xu Zhou; Feng Wu

arXiv:2511.01427·cs.CV·November 4, 2025

UniSOT: A Unified Framework for Multi-Modality Single Object Tracking

Yinchao Ma, Yuyang Tang, Wenfei Yang, Tianzhu Zhang, Xu Zhou, Feng Wu

PDF

Open Access

TL;DR

UniSOT is a novel unified tracking framework capable of handling multiple reference and video modalities simultaneously, improving robustness and performance across diverse tracking scenarios.

Contribution

This paper introduces UniSOT, the first unified tracker that supports various reference and video modalities with a single model, enhancing practical applicability.

Findings

01

Outperforms modality-specific trackers on 18 benchmarks.

02

Achieves over 3.0% AUC improvement on TNL2K.

03

Surpasses Un-Track by over 2.0% across RGB+X modalities.

Abstract

Single object tracking aims to localize target object with specific reference modalities (bounding box, natural language or both) in a sequence of specific video modalities (RGB, RGB+Depth, RGB+Thermal or RGB+Event.). Different reference modalities enable various human-machine interactions, and different video modalities are demanded in complex scenarios to enhance tracking robustness. Existing trackers are designed for single or several video modalities with single or several reference modalities, which leads to separate model designs and limits practical applications. Practically, a unified tracker is needed to handle various requirements. To the best of our knowledge, there is still no tracker that can perform tracking with these above reference modalities across these video modalities simultaneously. Thus, in this paper, we present a unified tracker, UniSOT, for different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Gaze Tracking and Assistive Technology · Advanced Technologies in Various Fields