The impact of intrinsic rewards on exploration in Reinforcement Learning

Aya Kayal; Eduardo Pignatelli; Laura Toni

arXiv:2501.11533·cs.AI·January 22, 2025

The impact of intrinsic rewards on exploration in Reinforcement Learning

Aya Kayal, Eduardo Pignatelli, Laura Toni

PDF

Open Access

TL;DR

This paper investigates how different levels of intrinsic rewards influence exploration strategies in Reinforcement Learning, revealing that State Count excels with low-dimensional observations, while Maximum Entropy offers more robustness in RGB environments.

Contribution

It provides an empirical comparison of four intrinsic rewards at various diversity levels, clarifying their impact on exploration in MiniGrid environments.

Findings

01

State Count performs best with low-dimensional observations.

02

Maximum Entropy offers more robust exploration with RGB observations.

03

Learning diverse skills with DIAYN does not necessarily enhance exploration.

Abstract

One of the open challenges in Reinforcement Learning is the hard exploration problem in sparse reward environments. Various types of intrinsic rewards have been proposed to address this challenge by pushing towards diversity. This diversity might be imposed at different levels, favouring the agent to explore different states, policies or behaviours (State, Policy and Skill level diversity, respectively). However, the impact of diversity on the agent's behaviour remains unclear. In this work, we aim to fill this gap by studying the effect of different levels of diversity imposed by intrinsic rewards on the exploration patterns of RL agents. We select four intrinsic rewards (State Count, Intrinsic Curiosity Module (ICM), Maximum Entropy, and Diversity is all you need (DIAYN)), each pushing for a different diversity level. We conduct an empirical study on MiniGrid environment to compare…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Innovation Diffusion and Forecasting