Reward Shaping via Meta-Learning

Haosheng Zou; Tongzheng Ren; Dong Yan; Hang Su; Jun Zhu

arXiv:1901.09330·cs.LG·January 29, 2019·41 cites

Reward Shaping via Meta-Learning

Haosheng Zou, Tongzheng Ren, Dong Yan, Hang Su, Jun Zhu

PDF

Open Access

TL;DR

This paper introduces a meta-learning framework for automatic reward shaping in reinforcement learning, improving learning efficiency across multiple tasks by learning a shared prior for reward functions.

Contribution

It presents a theoretically grounded meta-learning approach to automatically learn reward shaping priors, reducing the need for expert knowledge and hand-engineering.

Findings

01

Enhanced learning efficiency across tasks

02

Successful transfer from DQN to DDPG

03

Interpretable reward shaping visualizations

Abstract

Reward shaping is one of the most effective methods to tackle the crucial yet challenging problem of credit assignment in Reinforcement Learning (RL). However, designing shaping functions usually requires much expert knowledge and hand-engineering, and the difficulties are further exacerbated given multiple similar tasks to solve. In this paper, we consider reward shaping on a distribution of tasks, and propose a general meta-learning framework to automatically learn the efficient reward shaping on newly sampled tasks, assuming only shared state space but not necessarily action space. We first derive the theoretically optimal reward shaping in terms of credit assignment in model-free RL. We then propose a value-based meta-learning algorithm to extract an effective prior over the optimal reward shaping. The prior can be applied directly to new tasks, or provably adapted to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Autonomous Vehicle Technology and Safety

MethodsWeight Decay · Adam · Batch Normalization · *Communicated@Fast*How Do I Communicate to Expedia? · Experience Replay · Deep Deterministic Policy Gradient · Q-Learning · Dense Connections · Convolution · Deep Q-Network