On Learning Intrinsic Rewards for Policy Gradient Methods

Zeyu Zheng; Junhyuk Oh; Satinder Singh

arXiv:1804.06459·cs.AI·June 25, 2018·33 cites

On Learning Intrinsic Rewards for Policy Gradient Methods

Zeyu Zheng, Junhyuk Oh, Satinder Singh

PDF

Open Access 1 Repo

TL;DR

This paper introduces a new algorithm for learning intrinsic rewards tailored for policy-gradient reinforcement learning agents, demonstrating improved performance in Atari and Mujoco environments over extrinsic rewards alone.

Contribution

It develops a novel method to learn intrinsic rewards for policy-gradient agents, extending the Optimal Rewards Framework to learning agents.

Findings

01

Improved performance in most Atari and Mujoco tasks.

02

Intrinsic rewards enhance learning efficiency.

03

Baseline agents with only extrinsic rewards performed less effectively.

Abstract

In many sequential decision making tasks, it is challenging to design reward functions that help an RL agent efficiently learn behavior that is considered good by the agent designer. A number of different formulations of the reward-design problem, or close variants thereof, have been proposed in the literature. In this paper we build on the Optimal Rewards Framework of Singh et.al. that defines the optimal intrinsic reward function as one that when used by an RL agent achieves behavior that optimizes the task-specifying or extrinsic reward function. Previous work in this framework has shown how good intrinsic reward functions can be learned for lookahead search based planning agents. Whether it is possible to learn intrinsic reward functions for learning agents remains an open problem. In this paper we derive a novel algorithm for learning intrinsic rewards for policy-gradient based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Hwhitetooth/lirpg
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Artificial Intelligence in Games