ProContEXT: Exploring Progressive Context Transformer for Tracking
Jin-Peng Lan, Zhi-Qi Cheng, Jun-Yan He, Chenyang Li, Bin Luo, Xu Bao,, Wangmeng Xiang, Yifeng Geng, Xuansong Xie

TL;DR
ProContEXT introduces a progressive context encoding transformer for visual object tracking that leverages spatial and temporal contexts to improve accuracy and robustness in dynamic scenes, outperforming existing methods.
Contribution
It presents a novel framework that combines spatial-temporal context encoding with token pruning to enhance tracking performance and efficiency.
Findings
Achieves state-of-the-art results on GOT-10k and TrackingNet datasets.
Effectively models multi-context information for robust tracking.
Reduces computational complexity through token pruning.
Abstract
Existing Visual Object Tracking (VOT) only takes the target area in the first frame as a template. This causes tracking to inevitably fail in fast-changing and crowded scenes, as it cannot account for changes in object appearance between frames. To this end, we revamped the tracking framework with Progressive Context Encoding Transformer Tracker (ProContEXT), which coherently exploits spatial and temporal contexts to predict object motion trajectories. Specifically, ProContEXT leverages a context-aware self-attention module to encode the spatial and temporal context, refining and updating the multi-scale static and dynamic templates to progressively perform accurately tracking. It explores the complementary between spatial and temporal context, raising a new pathway to multi-context modeling for transformer-based trackers. In addition, ProContEXT revised the token pruning technique to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Air Quality Monitoring and Forecasting · Advanced Technologies in Various Fields
MethodsMulti-Head Attention · Attention Is All You Need · fail · Pruning · Linear Layer · Softmax · Adam · Label Smoothing · Position-Wise Feed-Forward Layer · Dense Connections
