Learning Reinforced Attentional Representation for End-to-End Visual Tracking
Peng Gao, Qiquan Zhang, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

TL;DR
This paper introduces an end-to-end neural network with hierarchical attention and online correlation filter updates for improved visual object tracking, achieving high accuracy and efficiency.
Contribution
It proposes a novel hierarchical attentional module with LSTM and MLPs, and integrates a contextual attentional correlation filter for end-to-end training and online adaptation.
Findings
Effective in discriminating and localizing targets
Achieves competitive accuracy on benchmark datasets
Operates with high computational efficiency
Abstract
Although numerous recent tracking approaches have made tremendous advances in the last decade, achieving high-performance visual tracking remains a challenge. In this paper, we propose an end-to-end network model to learn reinforced attentional representation for accurate target object discrimination and localization. We utilize a novel hierarchical attentional module with long short-term memory and multi-layer perceptrons to leverage both inter- and intra-frame attention to effectively facilitate visual pattern emphasis. Moreover, we incorporate a contextual attentional correlation filter into the backbone network to make our model trainable in an end-to-end fashion. Our proposed approach not only takes full advantage of informative geometries and semantics but also updates correlation filters online without fine-tuning the backbone network to enable the adaptation of variations in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
