Deep Q-learning from Demonstrations
Todd Hester, Matej Vecerik, Olivier Pietquin, Marc Lanctot, Tom, Schaul, Bilal Piot, Dan Horgan, John Quan, Andrew Sendonaris, Gabriel, Dulac-Arnold, Ian Osband, John Agapiou, Joel Z. Leibo, Audrunas Gruslys

TL;DR
This paper introduces Deep Q-learning from Demonstrations (DQfD), an algorithm that significantly accelerates deep reinforcement learning by leveraging demonstration data, outperforming prior methods and achieving state-of-the-art results in many games.
Contribution
The paper proposes DQfD, a novel algorithm combining demonstration data with deep Q-learning, improving learning speed and performance in complex environments.
Findings
DQfD outperforms PDD DQN in initial performance on 41 of 42 games.
DQfD learns to outperform the best demonstration in 14 games.
DQfD achieves state-of-the-art results in 11 games.
Abstract
Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-world tasks, where the agent must learn in the real environment. In this paper we study a setting where the agent may access data from previous control of the system. We present an algorithm, Deep Q-learning from Demonstrations (DQfD), that leverages small sets of demonstration data to massively accelerate the learning process even from relatively small amounts of demonstration data and is able to automatically assess the necessary ratio of demonstration data while learning thanks to a prioritized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Sports Analytics and Performance · Software Engineering Research
MethodsDense Connections · Convolution · Q-Learning · Deep Q-Network
