STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation   Learning on sparse-reward tasks

Jesus Bujalance Martin; Fabien Moutarde

arXiv:2201.03834·cs.LG·March 1, 2023·1 cites

STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks

Jesus Bujalance Martin, Fabien Moutarde

PDF

Open Access

TL;DR

STIR$^2$ introduces a reward relabeling technique that enhances sample efficiency in sparse-reward reinforcement learning by leveraging demonstrations and online episodes, improving performance on robotic manipulation tasks.

Contribution

The paper proposes a novel reward relabeling method that combines demonstrations and online episodes for improved data efficiency in sparse-reward RL environments.

Findings

01

Improves performance of SAC and DDPG on robotic tasks.

02

Data efficiency surpasses baseline methods.

03

Effective in sparse-reward settings.

Abstract

In the search for more sample-efficient reinforcement-learning (RL) algorithms, a promising direction is to leverage as much external off-policy data as possible. For instance, expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We present a new method, able to leverage both demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes (via relabeling), encouraging expert imitation and self-imitation. Our experiments focus on several robotic-manipulation tasks across two different simulation environments. We show that our method based on reward relabeling improves the performance of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics

MethodsBalanced Selection