Boosting Self-Supervised Tracking with Contextual Prompts and Noise Learning

Yaozong Zheng; Qihua Liang; Bineng Zhong; Shuimu Zeng; Yuanliang Xue; Ning Li; Shuxiang Song

arXiv:2605.06092·cs.CV·May 8, 2026

Boosting Self-Supervised Tracking with Contextual Prompts and Noise Learning

Yaozong Zheng, Qihua Liang, Bineng Zhong, Shuimu Zeng, Yuanliang Xue, Ning Li, Shuxiang Song

PDF

TL;DR

This paper introduces racker, a self-supervised tracking framework that uses dual-modal context association with semantic prompts and noise to learn robust representations from unlabeled videos.

Contribution

It proposes a novel dual-modal context association mechanism with a two-stage training process, enhancing self-supervised tracking by leveraging semantic prompts and noise.

Findings

01

Outperforms existing self-supervised tracking methods in experiments.

02

Effectively learns robust tracking representations from unlabeled videos.

03

Maintains efficient inference by applying context association only during training.

Abstract

Learning robust contextual knowledge from unlabeled videos is essential for advancing self-supervised tracking. However, conventional self-supervised trackers lack effective context modeling, while existing context association methods based on non-semantic queries struggle to adapt to unlabeled tracking scenarios, making it difficult to learn reliable contextual cues. In this work, we propose a novel self-supervised tracking framework, named \textbf{\tracker}, which introduces a dual-modal context association mechanism that jointly leverages fine-grained semantic prompts and contextual noise to drive the model toward learning robust tracking representations. Adherent to the easy-to-hard learning principle, our contextual association mechanism operates based on two stages. During early training, instance patch tokens (prompts) are assigned to both forward and backward tracking branches…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.