Recurrent World Models Facilitate Policy Evolution
David Ha, J\"urgen Schmidhuber

TL;DR
This paper introduces a generative recurrent neural network that models environments for reinforcement learning, enabling policy evolution and transfer within internally generated worlds, leading to improved performance.
Contribution
It presents a novel approach combining unsupervised world modeling with evolutionary policy training and internal environment simulation.
Findings
Achieved state-of-the-art results in multiple environments
Successfully trained policies entirely inside generated worlds
Demonstrated effective transfer of policies to real environments
Abstract
A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of paper at https://worldmodels.github.io
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInternational Development and Aid
