Efficient Motion Prompt Learning for Robust Visual Tracking

Jie Zhao; Xin Chen; Yongsheng Yuan; Michael Felsberg; Dong Wang; Huchuan Lu

arXiv:2505.16321·cs.CV·March 10, 2026

Efficient Motion Prompt Learning for Robust Visual Tracking

Jie Zhao, Xin Chen, Yongsheng Yuan, Michael Felsberg, Dong Wang, Huchuan Lu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a lightweight motion prompt learning method that enhances visual tracking robustness by integrating motion cues with visual features, achieving improved performance with minimal additional training.

Contribution

The paper presents a novel, plug-and-play motion prompt module that can be integrated into existing trackers to leverage temporal motion information effectively.

Findings

01

Significantly improves tracker robustness across seven benchmarks.

02

Achieves these improvements with minimal training costs.

03

Maintains real-time speed with negligible performance sacrifice.

Abstract

Due to the challenges of processing temporal information, most trackers depend solely on visual discriminability and overlook the unique temporal coherence of video data. In this paper, we propose a lightweight and plug-and-play motion prompt tracking method. It can be easily integrated into existing vision-based trackers to build a joint tracking framework leveraging both motion and vision cues, thereby achieving robust tracking through efficient prompt learning. A motion encoder with three different positional encodings is proposed to encode the long-term motion trajectory into the visual embedding space, while a fusion decoder and an adaptive weight mechanism are designed to dynamically fuse visual and motion features. We integrate our motion module into three different trackers with five models in total. Experiments on seven challenging tracking benchmarks demonstrate that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zj5559/motion-prompt-tracking
pytorchOfficial

Videos

Efficient Motion Prompt Learning for Robust Visual Tracking· slideslive

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Vision and Imaging

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings