Reinforcement learning with Demonstrations from Mismatched Task under   Sparse Reward

Yanjiang Guo; Jingyue Gao; Zheng Wu; Chengming Shi; Jianyu Chen

arXiv:2212.01509·cs.RO·March 9, 2023

Reinforcement learning with Demonstrations from Mismatched Task under Sparse Reward

Yanjiang Guo, Jingyue Gao, Zheng Wu, Chengming Shi, Jianyu Chen

PDF

Open Access

TL;DR

This paper introduces CRSfD, a method that improves reinforcement learning in robotics by using conservative reward shaping from demonstrations, especially effective when transferring from similar but mismatched tasks with sparse rewards.

Contribution

The paper proposes CRSfD, a novel reward shaping technique that leverages expert value functions to guide learning in mismatched tasks, addressing limitations of existing LfD methods.

Findings

01

CRSfD outperforms baseline methods in robot manipulation tasks.

02

The approach effectively transfers demonstrations to similar but different tasks.

03

CRSfD accelerates learning by guiding exploration around demonstrations.

Abstract

Reinforcement learning often suffer from the sparse reward issue in real-world robotics problems. Learning from demonstration (LfD) is an effective way to eliminate this problem, which leverages collected expert data to aid online learning. Prior works often assume that the learning agent and the expert aim to accomplish the same task, which requires collecting new data for every new task. In this paper, we consider the case where the target task is mismatched from but similar with that of the expert. Such setting can be challenging and we found existing LfD methods can not effectively guide learning in mismatched new tasks with sparse rewards. We propose conservative reward shaping from demonstration (CRSfD), which shapes the sparse rewards using estimated expert value function. To accelerate learning processes, CRSfD guides the agent to conservatively explore around demonstrations.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Viral Infectious Diseases and Gene Expression in Insects