Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook
Sajid Javed, Martin Danelljan, Fahad Shahbaz Khan, Muhammad Haris, Khan, Michael Felsberg, and Jiri Matas

TL;DR
This survey comprehensively reviews over 90 discriminative correlation filter and Siamese network-based visual object trackers, analyzing their theoretical foundations, performance across nine benchmarks, and outlining open research challenges.
Contribution
It provides a systematic review of DCF and Siamese trackers, including background theory, performance analysis, and future research directions in visual object tracking.
Findings
DCF and Siamese trackers have achieved significant progress.
Performance varies across different benchmarks and datasets.
Open challenges include robustness, speed, and generalization.
Abstract
Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
