Expert-augmented actor-critic for ViZDoom and Montezumas Revenge

Micha{\l} Garmulewicz; Henryk Michalewski; Piotr Mi{\l}o\'s

arXiv:1809.03447·cs.LG·September 11, 2018·6 cites

Expert-augmented actor-critic for ViZDoom and Montezumas Revenge

Micha{\l} Garmulewicz, Henryk Michalewski, Piotr Mi{\l}o\'s

PDF

Open Access 2 Repos

TL;DR

This paper introduces an expert-augmented actor-critic algorithm evaluated on sparse reward environments, demonstrating significant performance improvements and surpassing expert data in Montezumas Revenge and ViZDoom.

Contribution

The paper presents a novel expert-augmented actor-critic method that outperforms existing approaches and even surpasses expert data in challenging sparse reward environments.

Findings

01

Agent scores above 27,000 points on Montezumas Revenge.

02

Algorithm surpasses the performance of the expert data with proper hyperparameters.

03

Discovered an unreported bug allowing scores over 800,000 points in Montezumas Revenge.

Abstract

We propose an expert-augmented actor-critic algorithm, which we evaluate on two environments with sparse rewards: Montezumas Revenge and a demanding maze from the ViZDoom suite. In the case of Montezumas Revenge, an agent trained with our method achieves very good results consistently scoring above 27,000 points (in many experiments beating the first world). With an appropriate choice of hyperparameters, our algorithm surpasses the performance of the expert data. In a number of experiments, we have observed an unreported bug in Montezumas Revenge which allowed the agent to score more than 800,000 points.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Evolutionary Algorithms and Applications