STIR$^2$: Reward Relabelling for combined Reinforcement and Imitation Learning on sparse-reward tasks
Jesus Bujalance Martin, Fabien Moutarde

TL;DR
STIR$^2$ introduces a reward relabeling technique that enhances sample efficiency in sparse-reward reinforcement learning by leveraging demonstrations and online episodes, improving performance on robotic manipulation tasks.
Contribution
The paper proposes a novel reward relabeling method that combines demonstrations and online episodes for improved data efficiency in sparse-reward RL environments.
Findings
Improves performance of SAC and DDPG on robotic tasks.
Data efficiency surpasses baseline methods.
Effective in sparse-reward settings.
Abstract
In the search for more sample-efficient reinforcement-learning (RL) algorithms, a promising direction is to leverage as much external off-policy data as possible. For instance, expert demonstrations. In the past, multiple ideas have been proposed to make good use of the demonstrations added to the replay buffer, such as pretraining on demonstrations only or minimizing additional cost functions. We present a new method, able to leverage both demonstrations and episodes collected online in any sparse-reward environment with any off-policy algorithm. Our method is based on a reward bonus given to demonstrations and successful episodes (via relabeling), encouraging expert imitation and self-imitation. Our experiments focus on several robotic-manipulation tasks across two different simulation environments. We show that our method based on reward relabeling improves the performance of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
MethodsBalanced Selection
