Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers

Zhengbo Zhang; Li Xu; Duo Peng; Hossein Rahmani; Jun Liu

arXiv:2407.08394·cs.CV·July 17, 2024

Diff-Tracker: Text-to-Image Diffusion Models are Unsupervised Trackers

Zhengbo Zhang, Li Xu, Duo Peng, Hossein Rahmani, Jun Liu

PDF

Open Access

TL;DR

Diff-Tracker utilizes pre-trained text-to-image diffusion models with learned prompts and online updates to perform unsupervised visual tracking, achieving state-of-the-art results across multiple benchmarks.

Contribution

It introduces a novel unsupervised tracking method leveraging pre-trained diffusion models with dynamic prompt learning and updating mechanisms.

Findings

01

Achieves state-of-the-art performance on five benchmark datasets.

02

Effectively recognizes and tracks targets without supervision.

03

Demonstrates robustness across diverse tracking scenarios.

Abstract

We introduce Diff-Tracker, a novel approach for the challenging unsupervised visual tracking task leveraging the pre-trained text-to-image diffusion model. Our main idea is to leverage the rich knowledge encapsulated within the pre-trained diffusion model, such as the understanding of image semantics and structural information, to address unsupervised visual tracking. To this end, we design an initial prompt learner to enable the diffusion model to recognize the tracking target by learning a prompt representing the target. Furthermore, to facilitate dynamic adaptation of the prompt to the target's movements, we propose an online prompt updater. Extensive experiments on five benchmark datasets demonstrate the effectiveness of our proposed method, which also achieves state-of-the-art performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsDiffusion