Reward prediction for representation learning and reward shaping

Hlynur Dav\'i{\dh} Hlynsson; Laurenz Wiskott

arXiv:2105.03172·cs.LG·May 10, 2021·1 cites

Reward prediction for representation learning and reward shaping

Hlynur Dav\'i{\dh} Hlynsson, Laurenz Wiskott

PDF

Open Access

TL;DR

This paper introduces a self-supervised reward prediction method to improve data efficiency in reinforcement learning, especially in high-dimensional and sparse reward environments, by enhancing representations and shaping rewards.

Contribution

It proposes a novel self-supervised reward prediction approach for representation learning and reward shaping, improving RL performance in visual, goal-oriented tasks.

Findings

01

Significantly improves RL performance with visual inputs

02

Enhances data efficiency in sparse reward settings

03

Effective with Actor Critic and PPO algorithms

Abstract

One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in environment with a single, terminating, goal state. We augment the training of out-of-the-box RL agents by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to significantly enhance Actor Critic using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)