End-to-end Active Object Tracking and Its Real-world Deployment via   Reinforcement Learning

Wenhan Luo; Peng Sun; Fangwei Zhong; Wei Liu; Tong Zhang; Yizhou Wang

arXiv:1808.03405·cs.CV·February 14, 2019·6 cites

End-to-end Active Object Tracking and Its Real-world Deployment via Reinforcement Learning

Wenhan Luo, Peng Sun, Fangwei Zhong, Wei Liu, Tong Zhang, Yizhou Wang

PDF

Open Access

TL;DR

This paper presents an end-to-end deep reinforcement learning approach for active object tracking that generalizes well from simulation to real-world deployment, reducing manual tuning and labeling efforts.

Contribution

The authors introduce a novel end-to-end reinforcement learning framework with environment augmentation and custom rewards for active object tracking, demonstrating successful simulation-to-real transfer.

Findings

01

The system generalizes to unseen scenarios in simulation.

02

It can recover from tracking failures effectively.

03

Successful real-world robot deployment demonstrates practical viability.

Abstract

We study active object tracking, where a tracker takes visual observations (i.e., frame sequences) as input and produces the corresponding camera control signals as output (e.g., move forward, turn left, etc.). Conventional methods tackle tracking and camera control tasks separately, and the resulting system is difficult to tune jointly. These methods also require significant human efforts for image labeling and expensive trial-and-error system tuning in the real world. To address these issues, we propose, in this paper, an end-to-end solution via deep reinforcement learning. A ConvNet-LSTM function approximator is adopted for the direct frame-to-action prediction. We further propose an environment augmentation technique and a customized reward function, which are crucial for successful training. The tracker trained in simulators (ViZDoom and Unreal Engine) demonstrates good…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Human-Animal Interaction Studies