Predicting the Best of N Visual Trackers
Basit Alawode, Sajid Javed, Arif Mahmood, and Jiri Matas

TL;DR
This paper introduces a meta-tracker that predicts the best visual tracker for a video sequence using initial frames, significantly outperforming existing trackers across multiple benchmarks by leveraging self-supervised learning architectures.
Contribution
The paper proposes a novel meta-tracker framework that predicts the best visual tracker for a sequence, using self-supervised models, and demonstrates its superior performance on standard benchmarks.
Findings
Meta-tracker outperforms state-of-the-art trackers on nine benchmarks.
Frame-level meta-tracker adapts to variations within long sequences.
DINO with ViT-S backbone performs best among tested architectures.
Abstract
We observe that the performance of SOTA visual trackers surprisingly strongly varies across different video attributes and datasets. No single tracker remains the best performer across all tracking attributes and datasets. To bridge this gap, for a given video sequence, we predict the "Best of the N Trackers", called the BofN meta-tracker. At its core, a Tracking Performance Prediction Network (TP2N) selects a predicted best performing visual tracker for the given video sequence using only a few initial frames. We also introduce a frame-level BofN meta-tracker which keeps predicting best performer after regular temporal intervals. The TP2N is based on self-supervised learning architectures MocoV2, SwAv, BT, and DINO; experiments show that the DINO with ViT-S as a backbone performs the best. The video-level BofN meta-tracker outperforms, by a large margin, existing SOTA trackers on nine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics
MethodsAttention Is All You Need · Artemisinin Optimization based on Malaria Therapy: Algorithm and Applications to Medical Image Segmentation · Fast Attention Via Positive Orthogonal Random Features · Layer Normalization · Linear Layer · Softmax · Performer · Multi-Head Attention · Dense Connections · Residual Connection
