Multi-attention Associate Prediction Network for Visual Tracking
Xinglong Sun, Haijiang Sun, Shan Jiang, Jiacheng Wang and, Xilai Wei, Zhonghe Hu

TL;DR
This paper introduces MAPNet, a novel visual tracking network with specialized matchers and alignment modules that improve feature matching for classification and regression tasks, leading to superior tracking performance.
Contribution
The paper proposes category-aware and spatial-aware matchers along with a dual alignment module, addressing the mismatch and decision misalignment issues in visual tracking networks.
Findings
Achieves leading performance on five benchmarks.
Outperforms state-of-the-art methods.
Effectively captures category semantics and spatial contexts.
Abstract
Classification-regression prediction networks have realized impressive success in several modern deep trackers. However, there is an inherent difference between classification and regression tasks, so they have diverse even opposite demands for feature matching. Existed models always ignore the key issue and only employ a unified matching block in two task branches, decaying the decision quality. Besides, these models also struggle with decision misalignment situation. In this paper, we propose a multi-attention associate prediction network (MAPNet) to tackle the above problems. Concretely, two novel matchers, i.e., category-aware matcher and spatial-aware matcher, are first designed for feature comparison by integrating self, cross, channel or spatial attentions organically. They are capable of fully capturing the category-related semantics for classification and the local spatial…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Air Quality Monitoring and Forecasting
