Explicit Visual Prompts for Visual Object Tracking

Liangtao Shi; Bineng Zhong; Qihua Liang; Ning Li; Shengping Zhang,; Xianxian Li

arXiv:2401.03142·cs.CV·January 9, 2024·1 cites

Explicit Visual Prompts for Visual Object Tracking

Liangtao Shi, Bineng Zhong, Qihua Liang, Ning Li, Shengping Zhang,, Xianxian Li

PDF

Open Access 1 Repo 1 Video

TL;DR

EVPTrack introduces an explicit visual prompts framework utilizing spatio-temporal tokens and multi-scale information to improve visual object tracking by avoiding template updates and enhancing efficiency.

Contribution

The paper proposes a novel explicit visual prompts framework for tracking that leverages spatio-temporal tokens and multi-scale features, eliminating the need for template updating strategies.

Findings

01

Achieves competitive performance on six benchmarks.

02

Operates at real-time speed with effective exploitation of spatio-temporal info.

03

Improves handling of target scale changes through multi-scale prompts.

Abstract

How to effectively exploit spatio-temporal information is crucial to capture target appearance changes in visual tracking. However, most deep learning-based trackers mainly focus on designing a complicated appearance model or template updating strategy, while lacking the exploitation of context between consecutive frames and thus entailing the \textit{when-and-how-to-update} dilemma. To address these issues, we propose a novel explicit visual prompts framework for visual tracking, dubbed \textbf{EVPTrack}. Specifically, we utilize spatio-temporal tokens to propagate information between consecutive frames without focusing on updating templates. As a result, we cannot only alleviate the challenge of \textit{when-to-update}, but also avoid the hyper-parameters associated with updating strategies. Then, we utilize the spatio-temporal tokens to generate explicit visual prompts that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

GXNU-ZhongLab/EVPTrack
pytorchOfficial

Videos

Explicit Visual Prompts for Visual Object Tracking· underline

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Face recognition and analysis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus