Deep Reinforcement Learning for Visual Object Tracking in Videos
Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang

TL;DR
This paper presents a novel end-to-end deep reinforcement learning approach for visual object tracking in videos, using a recurrent convolutional neural network to predict object locations and achieve state-of-the-art performance.
Contribution
It introduces the first neural network tracker combining convolutional, recurrent networks, and reinforcement learning for improved video object tracking.
Findings
Achieves state-of-the-art tracking performance.
Operates faster than real-time.
First to combine CNN, RNN, and RL for tracking.
Abstract
In this paper we introduce a fully end-to-end approach for visual tracking in videos that learns to predict the bounding box locations of a target object at every frame. An important insight is that the tracking problem can be considered as a sequential decision-making process and historical semantics encode highly relevant information for future decisions. Based on this intuition, we formulate our model as a recurrent convolutional neural network agent that interacts with a video overtime, and our model can be trained with reinforcement learning (RL) algorithms to learn good tracking policies that pay attention to continuous, inter-frame correlation and maximize tracking performance in the long run. The proposed tracking algorithm achieves state-of-the-art performance in an existing tracking benchmark and operates at frame-rates faster than real-time. To the best of our knowledge, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Impact of Light on Environment and Health · Human Pose and Action Recognition
