Learning Dynamic Memory Networks for Object Tracking
Tianyu Yang, Antoni B. Chan

TL;DR
This paper introduces a dynamic memory network for object tracking that adapts to appearance changes in real-time using an LSTM-based memory controller, attention mechanisms, and gated residual learning, achieving high accuracy and speed.
Contribution
It proposes a novel feed-forward tracking approach with external memory, enabling efficient adaptation without online fine-tuning and scalable memory capacity.
Findings
Outperforms state-of-the-art on OTB and VOT datasets
Operates at 50 fps in real-time
Effectively adapts to appearance variations
Abstract
Template-matching methods for visual tracking have gained popularity recently due to their comparable performance and fast speed. However, they lack effective ways to adapt to changes in the target object's appearance, making their tracking accuracy still far from state-of-the-art. In this paper, we propose a dynamic memory network to adapt the template to the target's appearance variations during tracking. An LSTM is used as a memory controller, where the input is the search feature map and the outputs are the control signals for the reading and writing process of the memory block. As the location of the target is at first unknown in the search feature map, an attention mechanism is applied to concentrate the LSTM input on the potential target. To prevent aggressive model adaptivity, we apply gated residual template learning to control the amount of retrieved memory that is used to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Visual Attention and Saliency Detection · Human Pose and Action Recognition
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Sigmoid Activation · Tanh Activation · Softmax · Gated Recurrent Unit · Dynamic Memory Network · Memory Network · Long Short-Term Memory
