Learning Reinforced Attentional Representation for End-to-End Visual   Tracking

Peng Gao; Qiquan Zhang; Fei Wang; Liyi Xiao; Hamido Fujita; Yan Zhang

arXiv:1908.10009·cs.CV·January 3, 2020

Learning Reinforced Attentional Representation for End-to-End Visual Tracking

Peng Gao, Qiquan Zhang, Fei Wang, Liyi Xiao, Hamido Fujita, Yan Zhang

PDF

TL;DR

This paper introduces an end-to-end neural network with hierarchical attention and online correlation filter updates for improved visual object tracking, achieving high accuracy and efficiency.

Contribution

It proposes a novel hierarchical attentional module with LSTM and MLPs, and integrates a contextual attentional correlation filter for end-to-end training and online adaptation.

Findings

01

Effective in discriminating and localizing targets

02

Achieves competitive accuracy on benchmark datasets

03

Operates with high computational efficiency

Abstract

Although numerous recent tracking approaches have made tremendous advances in the last decade, achieving high-performance visual tracking remains a challenge. In this paper, we propose an end-to-end network model to learn reinforced attentional representation for accurate target object discrimination and localization. We utilize a novel hierarchical attentional module with long short-term memory and multi-layer perceptrons to leverage both inter- and intra-frame attention to effectively facilitate visual pattern emphasis. Moreover, we incorporate a contextual attentional correlation filter into the backbone network to make our model trainable in an end-to-end fashion. Our proposed approach not only takes full advantage of informative geometries and semantics but also updates correlation filters online without fine-tuning the backbone network to enable the adaptation of variations in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.