TL;DR
This paper introduces a method for reinforcement learning that combines learned latent space representations with evolutionary planning, enabling effective decision-making without requiring an explicit world model.
Contribution
The paper presents a novel framework called EPLS that learns a stochastic world model in latent space and uses evolutionary algorithms for planning, improving over standard model-free methods.
Findings
EPLS outperforms standard model-free reinforcement learning agents.
The learned world model enables effective planning in complex environments.
Iterative refinement improves the accuracy of the world model and planning performance.
Abstract
Planning is a powerful approach to reinforcement learning with several desirable properties. However, it requires a model of the world, which is not readily available in many real-life problems. In this paper, we propose to learn a world model that enables Evolutionary Planning in Latent Space (EPLS). We use a Variational Auto Encoder (VAE) to learn a compressed latent representation of individual observations and extend a Mixture Density Recurrent Neural Network (MDRNN) to learn a stochastic, multi-modal forward model of the world that can be used for planning. We use the Random Mutation Hill Climbing (RMHC) to find a sequence of actions that maximize expected reward in this learned model of the world. We demonstrate how to build a model of the world by bootstrapping it with rollouts from a random policy and iteratively refining it with rollouts from an increasingly accurate planning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
