GAN-based Intrinsic Exploration For Sample Efficient Reinforcement   Learning

Do\u{g}ay Kamar (1); Naz{\i}m Kemal \"Ure (1; 2); G\"ozde \"Unal (1; and 2) ((1) Faculty of Computer; Informatics; Istanbul Technical; University (2) Artificial Intelligence; Data Science Research Center,; Istanbul Technical University)

arXiv:2206.14256·cs.LG·June 30, 2022

GAN-based Intrinsic Exploration For Sample Efficient Reinforcement Learning

Do\u{g}ay Kamar (1), Naz{\i}m Kemal \"Ure (1, 2), G\"ozde \"Unal (1, and 2) ((1) Faculty of Computer, Informatics, Istanbul Technical, University (2) Artificial Intelligence, Data Science Research Center,, Istanbul Technical University)

PDF

TL;DR

This paper introduces a GAN-based intrinsic reward method for reinforcement learning that improves exploration efficiency in environments with sparse or no rewards by encouraging the agent to visit out-of-distribution states.

Contribution

The paper presents a novel GAN-based intrinsic reward module that guides exploration by identifying and incentivizing exploration of unseen states in RL environments.

Findings

01

Effective exploration in Super Mario Bros without rewards

02

Improved exploration in Montezuma's Revenge with sparse rewards

03

Demonstrated capability of GAN-based intrinsic rewards to explore efficiently

Abstract

In this study, we address the problem of efficient exploration in reinforcement learning. Most common exploration approaches depend on random action selection, however these approaches do not work well in environments with sparse or no rewards. We propose Generative Adversarial Network-based Intrinsic Reward Module that learns the distribution of the observed states and sends an intrinsic reward that is computed as high for states that are out of distribution, in order to lead agent to unexplored states. We evaluate our approach in Super Mario Bros for a no reward setting and in Montezuma's Revenge for a sparse reward setting and show that our approach is indeed capable of exploring efficiently. We discuss a few weaknesses and conclude by discussing future works.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.