Jointly Pre-training with Supervised, Autoencoder, and Value Losses for Deep Reinforcement Learning
Gabriel V. de la Cruz Jr., Yunshu Du, Matthew E. Taylor

TL;DR
This paper introduces a joint pre-training strategy combining supervised, autoencoder, and value losses to incorporate human knowledge into Deep Reinforcement Learning, significantly improving learning efficiency and performance in Atari games.
Contribution
It proposes a novel pre-training method that jointly optimizes multiple losses, enhancing feature learning and accelerating DRL training with human demonstrations.
Findings
Pre-training improves Atari game performance with fewer interactions.
The method outperforms state-of-the-art algorithms in Pong and MsPacman.
Pre-training is lightweight and easy to implement.
Abstract
Deep Reinforcement Learning (DRL) algorithms are known to be data inefficient. One reason is that a DRL agent learns both the feature and the policy tabula rasa. Integrating prior knowledge into DRL algorithms is one way to improve learning efficiency since it helps to build helpful representations. In this work, we consider incorporating human knowledge to accelerate the asynchronous advantage actor-critic (A3C) algorithm by pre-training a small amount of non-expert human demonstrations. We leverage the supervised autoencoder framework and propose a novel pre-training strategy that jointly trains a weighted supervised classification loss, an unsupervised reconstruction loss, and an expected return loss. The resulting pre-trained model learns more useful features compared to independently training in supervised or unsupervised fashion. Our pre-training method drastically improved the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsEntropy Regularization · Dense Connections · Softmax · Convolution · A3C
