CiteTracker: Correlating Image and Text for Visual Tracking

Xin Li; Yuqing Huang; Zhenyu He; Yaowei Wang; Huchuan Lu; Ming-Hsuan; Yang

arXiv:2308.11322·cs.CV·August 23, 2023·5 cites

CiteTracker: Correlating Image and Text for Visual Tracking

Xin Li, Yuqing Huang, Zhenyu He, Yaowei Wang, Huchuan Lu, Ming-Hsuan, Yang

PDF

Open Access 1 Repo

TL;DR

CiteTracker enhances visual tracking by converting target images into descriptive text and using attention-based correlation to improve robustness against target variations.

Contribution

The paper introduces a novel approach that combines image-to-text conversion with attention mechanisms for more accurate and adaptable visual tracking.

Findings

01

Outperforms state-of-the-art methods on five datasets.

02

Effective in handling drastic target variations.

03

Demonstrates the benefit of integrating textual descriptions in tracking.

Abstract

Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking. However, a single image patch cannot provide a complete and precise concept of the target object as images are limited in their ability to abstract and can be ambiguous, which makes it difficult to track targets with drastic variations. In this paper, we propose the CiteTracker to enhance target modeling and inference in visual tracking by connecting images and text. Specifically, we develop a text generation module to convert the target image patch into a descriptive text containing its class and attribute information, providing a comprehensive reference point for the target. In addition, a dynamic description module is designed to adapt to target variations for more effective target representation. We then associate the target description and the search image using an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

norahgreen/citetracker
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Mobility and Location-Based Analysis · Face recognition and analysis