Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu; Weixun Wang; Hangtian Jia; Yixiang Wang; Yingfeng Chen,; Jianye Hao; Feng Wu; Changjie Fan

arXiv:2011.02669·cs.LG·November 6, 2020·94 cites

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

Yujing Hu, Weixun Wang, Hangtian Jia, Yixiang Wang, Yingfeng Chen,, Jianye Hao, Feng Wu, Changjie Fan

PDF

Open Access 1 Video

TL;DR

This paper introduces a novel adaptive method for utilizing reward shaping in reinforcement learning by formulating it as a bi-level optimization problem, improving performance by selectively leveraging shaping rewards.

Contribution

It proposes a new bi-level optimization framework and algorithms for adaptively utilizing shaping rewards, addressing imperfections in human-designed reward functions.

Findings

01

Algorithms effectively exploit beneficial shaping rewards.

02

Methods ignore or transform unbeneficial shaping rewards.

03

Improved performance in sparse-reward environments.

Abstract

Reward shaping is an effective technique for incorporating domain knowledge into reinforcement learning (RL). Existing approaches such as potential-based reward shaping normally make full use of a given shaping reward function. However, since the transformation of human knowledge into numeric reward values is often imperfect due to reasons such as human cognitive bias, completely utilizing the shaping reward function may fail to improve the performance of RL algorithms. In this paper, we consider the problem of adaptively utilizing a given shaping reward function. We formulate the utilization of shaping rewards as a bi-level optimization problem, where the lower level is to optimize policy using the shaping rewards and the upper level is to optimize a parameterized shaping weight function for true reward maximization. We formally derive the gradient of the expected true reward with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Software Engineering Research · Robot Manipulation and Learning