Visual Object Tracking with Discriminative Filters and Siamese Networks:   A Survey and Outlook

Sajid Javed; Martin Danelljan; Fahad Shahbaz Khan; Muhammad Haris; Khan; Michael Felsberg; and Jiri Matas

arXiv:2112.02838·cs.CV·December 7, 2021

Visual Object Tracking with Discriminative Filters and Siamese Networks: A Survey and Outlook

Sajid Javed, Martin Danelljan, Fahad Shahbaz Khan, Muhammad Haris, Khan, Michael Felsberg, and Jiri Matas

PDF

TL;DR

This survey comprehensively reviews over 90 discriminative correlation filter and Siamese network-based visual object trackers, analyzing their theoretical foundations, performance across nine benchmarks, and outlining open research challenges.

Contribution

It provides a systematic review of DCF and Siamese trackers, including background theory, performance analysis, and future research directions in visual object tracking.

Findings

01

DCF and Siamese trackers have achieved significant progress.

02

Performance varies across different benchmarks and datasets.

03

Open challenges include robustness, speed, and generalization.

Abstract

Accurate and robust visual object tracking is one of the most challenging and fundamental computer vision problems. It entails estimating the trajectory of the target in an image sequence, given only its initial location, and segmentation, or its rough approximation in the form of a bounding box. Discriminative Correlation Filters (DCFs) and deep Siamese Networks (SNs) have emerged as dominating tracking paradigms, which have led to significant progress. Following the rapid evolution of visual object tracking in the last decade, this survey presents a systematic and thorough review of more than 90 DCFs and Siamese trackers, based on results in nine tracking benchmarks. First, we present the background theory of both the DCF and Siamese tracking core formulations. Then, we distinguish and comprehensively review the shared as well as specific open research challenges in both these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings