VPTracker: Global Vision-Language Tracking via Visual Prompt

Jingchao Wang; Kaiwen Zhou; Zhijian Wu; Kunhua Ji; Dingjiang Huang; Yefeng Zheng

arXiv:2512.22799·cs.CV·April 15, 2026

VPTracker: Global Vision-Language Tracking via Visual Prompt

Jingchao Wang, Kaiwen Zhou, Zhijian Wu, Kunhua Ji, Dingjiang Huang, Yefeng Zheng

PDF

1 Repo 1 Models

TL;DR

VPTracker introduces a global vision-language tracking framework leveraging multimodal large language models and spatial priors, significantly improving robustness and disambiguation in challenging scenarios.

Contribution

It is the first to utilize global search with MLLMs for vision-language tracking, incorporating spatial priors to enhance accuracy and reduce distractions.

Findings

01

Significantly improves tracking stability under occlusions and viewpoint changes.

02

Reduces false positives from similar objects through spatial priors.

03

Demonstrates superior performance compared to local search methods.

Abstract

Vision-Language Tracking aims to continuously localize objects described by a visual template and a language description. Existing methods, however, are typically limited to local search, making them prone to failures under viewpoint changes, occlusions, and rapid target movements. In this work, we introduce the first global tracking framework based on Multimodal Large Language Models (VPTracker), exploiting their powerful semantic reasoning to locate targets across the entire image space. While global search improves robustness and reduces drift, it also introduces distractions from visually or semantically similar objects. To address this, we propose a location-aware visual prompting mechanism that incorporates spatial priors into the MLLM. Specifically, we construct a region-level prompt based on the target's previous location, enabling the model to prioritize region-level…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jcwang0602/VPTracker
github

Models

🤗
jcwang0602/VPTracker
model· 13 dl· ♡ 1
13 dl♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.