Learning in Sparse Rewards settings through Quality-Diversity algorithms
Giuseppe Paolo

TL;DR
This paper introduces novel algorithms combining Quality-Diversity methods with representation learning to improve exploration in sparse reward reinforcement learning, reducing the need for prior information and enabling autonomous discovery of high-performance policies.
Contribution
It proposes TAXONS for low-dimensional policy space representation, SERENE for focused exploration, and STAX combining both to enhance exploration efficiency in sparse rewards.
Findings
TAXONS effectively learns low-dimensional policy representations.
SERENE improves exploration by separating search and reward exploitation.
STAX outperforms baseline methods in sparse reward environments.
Abstract
In the Reinforcement Learning (RL) framework, the learning is guided through a reward signal. This means that in situations of sparse rewards the agent has to focus on exploration, in order to discover which action, or set of actions leads to the reward. RL agents usually struggle with this. Exploration is the focus of Quality-Diversity (QD) methods. In this thesis, we approach the problem of sparse rewards with these algorithms, and in particular with Novelty Search (NS). This is a method that only focuses on the diversity of the possible policies behaviors. The first part of the thesis focuses on learning a representation of the space in which the diversity of the policies is evaluated. In this regard, we propose the TAXONS algorithm, a method that learns a low-dimensional representation of the search space through an AutoEncoder. While effective, TAXONS still requires information on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Control Systems and Identification · Advanced Control Systems Optimization
