Reading Relevant Feature from Global Representation Memory for Visual Object Tracking
Xinyu Zhou, Pinxue Guo, Lingyi Hong, Jinglun Li, Wei Zhang, Weifeng, Ge, Wenqiang Zhang

TL;DR
This paper introduces a novel visual object tracking method that uses relevance attention and a global memory to dynamically select and utilize the most relevant historical features, improving tracking efficiency and accuracy.
Contribution
It proposes a relevance attention mechanism combined with a global memory that adaptively selects pertinent historical features for each frame, reducing redundancy and enhancing tracking performance.
Findings
Achieves competitive results on five challenging datasets.
Operates at 71 FPS, demonstrating real-time capability.
Effectively reduces redundancy by selecting relevant historical features.
Abstract
Reference features from a template or historical frames are crucial for visual object tracking. Prior works utilize all features from a fixed template or memory for visual object tracking. However, due to the dynamic nature of videos, the required reference historical information for different search regions at different time steps is also inconsistent. Therefore, using all features in the template and memory can lead to redundancy and impair tracking performance. To alleviate this issue, we propose a novel tracking paradigm, consisting of a relevance attention mechanism and a global representation memory, which can adaptively assist the search region in selecting the most relevant historical information from reference features. Specifically, the proposed relevance attention mechanism in this work differs from previous approaches in that it can dynamically choose and build the optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Visual Attention and Saliency Detection
