Spatio-Temporal Matching for Siamese Visual Tracking
Jinpu Zhang, Yuehuan Wang

TL;DR
This paper introduces a novel spatio-temporal matching approach for Siamese visual tracking that leverages 4-D information, including space and time, to improve robustness and accuracy over traditional methods.
Contribution
It proposes a space-variant channel-guided correlation and an aberrance repressed module to enhance 4-D matching in object tracking, along with a new anchor-free framework.
Findings
Achieves state-of-the-art results on multiple benchmarks.
Effectively suppresses aberrances in interframe responses.
Enhances robustness and accuracy in challenging scenarios.
Abstract
Similarity matching is a core operation in Siamese trackers. Most Siamese trackers carry out similarity learning via cross correlation that originates from the image matching field. However, unlike 2-D image matching, the matching network in object tracking requires 4-D information (height, width, channel and time). Cross correlation neglects the information from channel and time dimensions, and thus produces ambiguous matching. This paper proposes a spatio-temporal matching process to thoroughly explore the capability of 4-D matching in space (height, width and channel) and time. In spatial matching, we introduce a space-variant channel-guided correlation (SVC-Corr) to recalibrate channel-wise feature responses for each spatial location, which can guide the generation of the target-aware matching features. In temporal matching, we investigate the time-domain context relations of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Face recognition and analysis
