Recurrent World Models Facilitate Policy Evolution

David Ha; J\"urgen Schmidhuber

arXiv:1809.01999·cs.LG·September 7, 2018·407 cites

Recurrent World Models Facilitate Policy Evolution

David Ha, J\"urgen Schmidhuber

PDF

Open Access

TL;DR

This paper introduces a generative recurrent neural network that models environments for reinforcement learning, enabling policy evolution and transfer within internally generated worlds, leading to improved performance.

Contribution

It presents a novel approach combining unsupervised world modeling with evolutionary policy training and internal environment simulation.

Findings

01

Achieved state-of-the-art results in multiple environments

02

Successfully trained policies entirely inside generated worlds

03

Demonstrated effective transfer of policies to real environments

Abstract

A generative recurrent neural network is quickly trained in an unsupervised manner to model popular reinforcement learning environments through compressed spatio-temporal representations. The world model's extracted features are fed into compact and simple policies trained by evolution, achieving state of the art results in various environments. We also train our agent entirely inside of an environment generated by its own internal world model, and transfer this policy back into the actual environment. Interactive version of paper at https://worldmodels.github.io

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInternational Development and Aid