Learning to Shape Rewards using a Game of Two Partners
David Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez-Nieves,, Tianpei Yang, Matthew Taylor, Wenbin Song, Feifei Tong, Hui Chen, Jiangcheng, Zhu, Jun Wang, Yaodong Yang

TL;DR
ROSA is an automated reward shaping framework where two agents collaborate in a Markov game to construct effective shaping rewards, improving learning efficiency in sparse reward environments without manual reward engineering.
Contribution
We propose ROSA, a novel automated reward shaping method using a two-agent Markov game, eliminating manual reward design and enhancing learning in sparse reward tasks.
Findings
ROSA effectively constructs beneficial shaping rewards.
ROSA outperforms state-of-the-art reward shaping algorithms.
ROSA ensures efficient convergence to high-performance policies.
Abstract
Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques
