What Can Learned Intrinsic Rewards Capture?
Zeyu Zheng, Junhyuk Oh, Matteo Hessel, Zhongwen Xu, Manuel Kroiss,, Hado van Hasselt, David Silver, Satinder Singh

TL;DR
This paper introduces a scalable meta-gradient framework for learning intrinsic reward functions in reinforcement learning, enabling agents to capture long-term exploration and exploitation knowledge that generalizes across agents and environment changes.
Contribution
It proposes a novel meta-gradient approach to learn intrinsic rewards, shifting focus from behavior imitation to capturing what agents should strive for.
Findings
Feasible to learn and encode long-term exploration/exploitation knowledge
Learned reward functions generalize across different agents
Reward functions adapt to changes in environment dynamics
Abstract
The objective of a reinforcement learning agent is to behave so as to maximise the sum of a suitable scalar function of state: the reward. These rewards are typically given and immutable. In this paper, we instead consider the proposition that the reward function itself can be a good locus of learned knowledge. To investigate this, we propose a scalable meta-gradient framework for learning useful intrinsic reward functions across multiple lifetimes of experience. Through several proof-of-concept experiments, we show that it is feasible to learn and capture knowledge about long-term exploration and exploitation into a reward function. Furthermore, we show that unlike policy transfer methods that capture "how" the agent should behave, the learned reward functions can generalise to other kinds of agents and to changes in the dynamics of the environment by capturing "what" the agent should…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Neural dynamics and brain function
