Enhanced POET: Open-Ended Reinforcement Learning through Unbounded Invention of Learning Challenges and their Solutions
Rui Wang, Joel Lehman, Aditya Rawal, Jiale Zhi, Yulun Li, Jeff Clune,, Kenneth O. Stanley

TL;DR
This paper advances open-ended reinforcement learning by enhancing the POET algorithm with new measures of novelty and goal-switching heuristics, enabling it to generate and solve a broader, more diverse set of challenges.
Contribution
The paper introduces four key innovations, including a domain-general novelty measure and a scalable goal-switching heuristic, significantly improving POET's open-ended exploration capabilities.
Findings
Enhanced POET demonstrates unprecedented diversity in challenge-solving behaviors
The new measures enable continuous generation of meaningful novel challenges
The algorithm scales better and explores more complex environments
Abstract
Creating open-ended algorithms, which generate their own never-ending stream of novel and appropriately challenging learning opportunities, could help to automate and accelerate progress in machine learning. A recent step in this direction is the Paired Open-Ended Trailblazer (POET), an algorithm that generates and solves its own challenges, and allows solutions to goal-switch between challenges to avoid local optima. However, the original POET was unable to demonstrate its full creative potential because of limitations of the algorithm itself and because of external issues including a limited problem space and lack of a universal progress measure. Importantly, both limitations pose impediments not only for POET, but for the pursuit of open-endedness in general. Here we introduce and empirically validate two new innovations to the original algorithm, as well as two external innovations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Modular Robots and Swarm Intelligence
