Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Farzan Memarian; Wonjoon Goo; Rudolf Lioutikov; Scott Niekum; Ufuk; Topcu

arXiv:2103.04529·cs.LG·July 27, 2021

Self-Supervised Online Reward Shaping in Sparse-Reward Environments

Farzan Memarian, Wonjoon Goo, Rudolf Lioutikov, Scott Niekum, Ufuk, Topcu

PDF

Open Access 1 Repo

TL;DR

This paper proposes Self-supervised Online Reward Shaping (SORS), a method that automatically densifies sparse rewards in reinforcement learning, leading to improved sample efficiency without altering the optimal policy.

Contribution

The paper introduces SORS, a novel framework that automatically infers dense rewards from sparse signals, enhancing learning speed while preserving the original policy's optimality.

Findings

01

SORS significantly improves sample efficiency in sparse-reward environments.

02

SORS achieves comparable performance to hand-designed dense rewards.

03

Theoretical analysis confirms policy invariance under reward transformation.

Abstract

We introduce Self-supervised Online Reward Shaping (SORS), which aims to improve the sample efficiency of any RL algorithm in sparse-reward environments by automatically densifying rewards. The proposed framework alternates between classification-based reward inference and policy update steps -- the original sparse reward provides a self-supervisory signal for reward inference by ranking trajectories that the agent observes, while the policy update is performed with the newly inferred, typically dense reward function. We introduce theory that shows that, under certain conditions, this alteration of the reward function will not change the optimal policy of the original MDP, while potentially increasing learning speed significantly. Experimental results on several sparse-reward environments demonstrate that, across multiple domains, the proposed algorithm is not only significantly more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hiwonjoon/IROS2021_SORS
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Machine Learning and Algorithms · Reinforcement Learning in Robotics