What Can Learned Intrinsic Rewards Capture?

Zeyu Zheng; Junhyuk Oh; Matteo Hessel; Zhongwen Xu; Manuel Kroiss,; Hado van Hasselt; David Silver; Satinder Singh

arXiv:1912.05500·cs.AI·August 25, 2020·5 cites

What Can Learned Intrinsic Rewards Capture?

Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss,, Hado van Hasselt, David Silver, Satinder Singh

PDF

Open Access 1 Video

TL;DR

This paper introduces a scalable meta-gradient framework for learning intrinsic reward functions in reinforcement learning, enabling agents to capture long-term exploration and exploitation knowledge that generalizes across agents and environment changes.

Contribution

It proposes a novel meta-gradient approach to learn intrinsic rewards, shifting focus from behavior imitation to capturing what agents should strive for.

Findings

01

Feasible to learn and encode long-term exploration/exploitation knowledge

02

Learned reward functions generalize across different agents

03

Reward functions adapt to changes in environment dynamics

Abstract

The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

What Can Learned Intrinsic Rewards Capture?· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural dynamics and brain function