Hierarchical Attentive Recurrent Tracking
Adam R. Kosiorek, Alex Bewley, Ingmar Posner

TL;DR
This paper introduces a hierarchical attentive recurrent model inspired by human visual attention for class-agnostic object tracking in cluttered videos, improving focus on relevant features through multiple attention layers.
Contribution
It presents a novel fully differentiable hierarchical attention framework for object tracking, trained with auxiliary tasks to enhance convergence.
Findings
Effective in cluttered environments
Performs well on KTH and KITTI datasets
Outperforms baseline methods
Abstract
Class-agnostic object tracking is particularly difficult in cluttered environments as target specific discriminative models cannot be learned a priori. Inspired by how the human visual cortex employs spatial attention and separate "where" and "what" processing pathways to actively suppress irrelevant visual features, this work develops a hierarchical attentive recurrent model for single object tracking in videos. The first layer of attention discards the majority of background by selecting a region containing the object of interest, while the subsequent layers tune in on visual features particular to the tracked object. This framework is fully differentiable and can be trained in a purely data driven fashion by gradient methods. To improve training convergence, we augment the loss function with terms for a number of auxiliary tasks relevant for tracking. Evaluation of the proposed model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Impact of Light on Environment and Health · Air Quality Monitoring and Forecasting
