SUTrack: Towards Simple and Unified Single Object Tracking
Xin Chen, Ben Kang, Wanting Geng, Jiawen Zhu, Yi Liu and, Dong Wang, Huchuan Lu

TL;DR
SUTrack introduces a unified single object tracking framework capable of handling five diverse SOT tasks with a single model, reducing redundancy and enhancing cross-modal knowledge sharing, while outperforming task-specific models across multiple datasets.
Contribution
The paper presents a novel unified SOT framework that consolidates multiple tracking tasks into one model trained in a single session, with auxiliary strategies to boost performance.
Findings
Outperforms previous task-specific models on 11 datasets.
Effectively handles five diverse SOT tasks with a single model.
Provides models suitable for both edge devices and high-performance GPUs.
Abstract
In this paper, we propose a simple yet unified single object tracking (SOT) framework, dubbed SUTrack. It consolidates five SOT tasks (RGB-based, RGB-Depth, RGB-Thermal, RGB-Event, RGB-Language Tracking) into a unified model trained in a single session. Due to the distinct nature of the data, current methods typically design individual architectures and train separate models for each task. This fragmentation results in redundant training processes, repetitive technological innovations, and limited cross-modal knowledge sharing. In contrast, SUTrack demonstrates that a single model with a unified input representation can effectively handle various common SOT tasks, eliminating the need for task-specific designs and separate training sessions. Additionally, we introduce a task-recognition auxiliary training strategy and a soft token type embedding to further enhance SUTrack's performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Anomaly Detection Techniques and Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Fragmentation
