Learning to Shape Rewards using a Game of Two Partners

David Mguni; Taher Jafferjee; Jianhong Wang; Nicolas Perez-Nieves,; Tianpei Yang; Matthew Taylor; Wenbin Song; Feifei Tong; Hui Chen; Jiangcheng; Zhu; Jun Wang; Yaodong Yang

arXiv:2103.09159·cs.LG·February 7, 2023

Learning to Shape Rewards using a Game of Two Partners

David Mguni, Taher Jafferjee, Jianhong Wang, Nicolas Perez-Nieves,, Tianpei Yang, Matthew Taylor, Wenbin Song, Feifei Tong, Hui Chen, Jiangcheng, Zhu, Jun Wang, Yaodong Yang

PDF

Open Access 1 Video

TL;DR

ROSA is an automated reward shaping framework where two agents collaborate in a Markov game to construct effective shaping rewards, improving learning efficiency in sparse reward environments without manual reward engineering.

Contribution

We propose ROSA, a novel automated reward shaping method using a two-agent Markov game, eliminating manual reward design and enhancing learning in sparse reward tasks.

Findings

01

ROSA effectively constructs beneficial shaping rewards.

02

ROSA outperforms state-of-the-art reward shaping algorithms.

03

ROSA ensures efficient convergence to high-performance policies.

Abstract

Reward shaping (RS) is a powerful method in reinforcement learning (RL) for overcoming the problem of sparse or uninformative rewards. However, RS typically relies on manually engineered shaping-reward functions whose construction is time-consuming and error-prone. It also requires domain knowledge which runs contrary to the goal of autonomous learning. We introduce Reinforcement Learning Optimising Shaping Algorithm (ROSA), an automated reward shaping framework in which the shaping-reward function is constructed in a Markov game between two agents. A reward-shaping agent (Shaper) uses switching controls to determine which states to add shaping rewards for more efficient learning while the other agent (Controller) learns the optimal policy for the task using these shaped rewards. We prove that ROSA, which adopts existing RL algorithms, learns to construct a shaping-reward function that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning to Shape Rewards using a Game of Two Partners· underline

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Data Stream Mining Techniques