Deep Q-learning from Demonstrations

Todd Hester; Matej Vecerik; Olivier Pietquin; Marc Lanctot; Tom; Schaul; Bilal Piot; Dan Horgan; John Quan; Andrew Sendonaris; Gabriel; Dulac-Arnold; Ian Osband; John Agapiou; Joel Z. Leibo; Audrunas Gruslys

arXiv:1704.03732·cs.AI·November 27, 2017·307 cites

Deep Q-learning from Demonstrations

Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom, Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel, Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

PDF

Open Access 5 Repos

TL;DR

This paper introduces Deep Q-learning from Demonstrations (DQfD), an algorithm that significantly accelerates deep reinforcement learning by leveraging demonstration data, outperforming prior methods and achieving state-of-the-art results in many games.

Contribution

The paper proposes DQfD, a novel algorithm combining demonstration data with deep Q-learning, improving learning speed and performance in complex environments.

Findings

01

DQfD outperforms PDD DQN in initial performance on 41 of 42 games.

02

DQfD learns to outperform the best demonstration in 14 games.

03

DQfD achieves state-of-the-art results in 11 games.

Abstract

Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Sports Analytics and Performance · Software Engineering Research

MethodsDense Connections · Convolution · Q-Learning · Deep Q-Network