Learning from demonstrations with SACR2: Soft Actor-Critic with Reward   Relabeling

Jesus Bujalance Martin; Raphael Chekroun; Fabien Moutarde

arXiv:2110.14464·cs.LG·December 6, 2021·1 cites

Learning from demonstrations with SACR2: Soft Actor-Critic with Reward Relabeling

Jesus Bujalance Martin, Raphael Chekroun, Fabien Moutarde

PDF

Open Access

TL;DR

This paper introduces SACR2, a reinforcement learning method that enhances the Soft Actor-Critic algorithm with reward relabeling from demonstrations and successful episodes, improving learning efficiency in robotic manipulation tasks.

Contribution

The paper proposes SACR2, a novel reward relabeling technique for off-policy RL that leverages demonstration data and successful episodes to improve learning in sparse-reward environments.

Findings

01

SACR2 outperforms standard SAC in robotic reaching tasks.

02

Reward relabeling accelerates learning even without demonstrations.

03

The method improves sample efficiency and success rates.

Abstract

During recent years, deep reinforcement learning (DRL) has made successful incursions into complex decision-making applications such as robotics, autonomous driving or video games. Off-policy algorithms tend to be more sample-efficient than their on-policy counterparts, and can additionally benefit from any off-policy data stored in the replay buffer. Expert demonstrations are a popular source for such data: the agent is exposed to successful states and actions early on, which can accelerate the learning process and improve performance. In the past, multiple ideas have been proposed to make good use of the demonstrations in the buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We carry on a study to evaluate several of these ideas in isolation, to see which of them have the most significant impact. We also present a new method for sparse-reward…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI)