Improving Generalization in Meta Reinforcement Learning using Learned   Objectives

Louis Kirsch; Sjoerd van Steenkiste; J\"urgen Schmidhuber

arXiv:1910.04098·cs.LG·February 17, 2020·59 cites

Improving Generalization in Meta Reinforcement Learning using Learned Objectives

Louis Kirsch, Sjoerd van Steenkiste, J\"urgen Schmidhuber

PDF

Open Access

TL;DR

MetaGenRL is a novel meta reinforcement learning algorithm that learns a neural objective function to enhance generalization across diverse environments, outperforming some human-designed algorithms and improving sample efficiency.

Contribution

It introduces MetaGenRL, which meta-learns a neural objective function for better generalization and sample efficiency in diverse environments, inspired by biological evolution.

Findings

01

MetaGenRL generalizes to new, different environments.

02

It outperforms some human-engineered RL algorithms.

03

Uses off-policy second-order gradients for efficiency.

Abstract

Biological evolution has distilled the experiences of many learners into the general learning algorithms of humans. Our novel meta reinforcement learning algorithm MetaGenRL is inspired by this process. MetaGenRL distills the experiences of many complex agents to meta-learn a low-complexity neural objective function that decides how future individuals will learn. Unlike recent meta-RL algorithms, MetaGenRL can generalize to new environments that are entirely different from those used for meta-training. In some cases, it even outperforms human-engineered RL algorithms. MetaGenRL uses off-policy second-order gradients during meta-training that greatly increase its sample efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Adaptive Dynamic Programming Control