Self-Supervised Online Reward Shaping in Sparse-Reward Environments
Farzan Memarian, Wonjoon Goo, Rudolf Lioutikov, Scott Niekum, Ufuk, Topcu

TL;DR
This paper proposes Self-supervised Online Reward Shaping (SORS), a method that automatically densifies sparse rewards in reinforcement learning, leading to improved sample efficiency without altering the optimal policy.
Contribution
The paper introduces SORS, a novel framework that automatically infers dense rewards from sparse signals, enhancing learning speed while preserving the original policy's optimality.
Findings
SORS significantly improves sample efficiency in sparse-reward environments.
SORS achieves comparable performance to hand-designed dense rewards.
Theoretical analysis confirms policy invariance under reward transformation.
Abstract
We introduce Self-supervised Online Reward Shaping (SORS), which aims to improve the sample efficiency of any RL algorithm in sparse-reward environments by automatically densifying rewards. The proposed framework alternates between classification-based reward inference and policy update steps -- the original sparse reward provides a self-supervisory signal for reward inference by ranking trajectories that the agent observes, while the policy update is performed with the newly inferred, typically dense reward function. We introduce theory that shows that, under certain conditions, this alteration of the reward function will not change the optimal policy of the original MDP, while potentially increasing learning speed significantly. Experimental results on several sparse-reward environments demonstrate that, across multiple domains, the proposed algorithm is not only significantly more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Machine Learning and Algorithms · Reinforcement Learning in Robotics
