Deep Reinforcement Learning for Visual Object Tracking in Videos

Da Zhang; Hamid Maei; Xin Wang; Yuan-Fang Wang

arXiv:1701.08936·cs.CV·April 12, 2017·101 cites

Deep Reinforcement Learning for Visual Object Tracking in Videos

Da Zhang, Hamid Maei, Xin Wang, Yuan-Fang Wang

PDF

Open Access

TL;DR

This paper presents a novel end-to-end deep reinforcement learning approach for visual object tracking in videos, using a recurrent convolutional neural network to predict object locations and achieve state-of-the-art performance.

Contribution

It introduces the first neural network tracker combining convolutional, recurrent networks, and reinforcement learning for improved video object tracking.

Findings

01

Achieves state-of-the-art tracking performance.

02

Operates faster than real-time.

03

First to combine CNN, RNN, and RL for tracking.

Abstract

In this paper we introduce a fully end-to-end approach for visual tracking in videos that learns to predict the bounding box locations of a target object at every frame. An important insight is that the tracking problem can be considered as a sequential decision-making process and historical semantics encode highly relevant information for future decisions. Based on this intuition, we formulate our model as a recurrent convolutional neural network agent that interacts with a video overtime, and our model can be trained with reinforcement learning (RL) algorithms to learn good tracking policies that pay attention to continuous, inter-frame correlation and maximize tracking performance in the long run. The proposed tracking algorithm achieves state-of-the-art performance in an existing tracking benchmark and operates at frame-rates faster than real-time. To the best of our knowledge, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Impact of Light on Environment and Health · Human Pose and Action Recognition