Reward prediction for representation learning and reward shaping
Hlynur Dav\'i{\dh} Hlynsson, Laurenz Wiskott

TL;DR
This paper introduces a self-supervised reward prediction method to improve data efficiency in reinforcement learning, especially in high-dimensional and sparse reward environments, by enhancing representations and shaping rewards.
Contribution
It proposes a novel self-supervised reward prediction approach for representation learning and reward shaping, improving RL performance in visual, goal-oriented tasks.
Findings
Significantly improves RL performance with visual inputs
Enhances data efficiency in sparse reward settings
Effective with Actor Critic and PPO algorithms
Abstract
One of the fundamental challenges in reinforcement learning (RL) is the one of data efficiency: modern algorithms require a very large number of training samples, especially compared to humans, for solving environments with high-dimensional observations. The severity of this problem is increased when the reward signal is sparse. In this work, we propose learning a state representation in a self-supervised manner for reward prediction. The reward predictor learns to estimate either a raw or a smoothed version of the true reward signal in environment with a single, terminating, goal state. We augment the training of out-of-the-box RL agents by shaping the reward using our reward predictor during policy learning. Using our representation for preprocessing high-dimensional observations, as well as using the predictor for reward shaping, is shown to significantly enhance Actor Critic using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI)
