Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations   using Generative Models

Yuchen Wu; Melissa Mozifian; Florian Shkurti

arXiv:2011.01298·cs.RO·November 4, 2020·6 cites

Shaping Rewards for Reinforcement Learning with Imperfect Demonstrations using Generative Models

Yuchen Wu, Melissa Mozifian, Florian Shkurti

PDF

Open Access

TL;DR

This paper introduces a reward shaping method for reinforcement learning that leverages generative models trained on imperfect demonstrations to improve learning efficiency and robustness in robotic tasks.

Contribution

It proposes a novel approach that uses generative models for reward shaping from suboptimal demonstrations, enhancing data efficiency and robustness in reinforcement learning.

Findings

01

Accelerates policy learning by focusing exploration on high-value regions.

02

Effective with suboptimal and noisy demonstration data.

03

Validated through extensive simulations and real robot experiments.

Abstract

The potential benefits of model-free reinforcement learning to real robotics systems are limited by its uninformed exploration that leads to slow convergence, lack of data-efficiency, and unnecessary interactions with the environment. To address these drawbacks we propose a method that combines reinforcement and imitation learning by shaping the reward function with a state-and-action-dependent potential that is trained from demonstration data, using a generative model. We show that this accelerates policy learning by specifying high-value areas of the state and action space that are worth exploring first. Unlike the majority of existing methods that assume optimal demonstrations and incorporate the demonstration data as hard constraints on policy optimization, we instead incorporate demonstration data as advice in the form of a reward shaping potential trained as a generative model of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis

MethodsNormalizing Flows