Reinforcement Learning with Sparse Rewards using Guidance from Offline   Demonstration

Desik Rengarajan; Gargi Vaidya; Akshay Sarvesh; Dileep Kalathil,; Srinivas Shakkottai

arXiv:2202.04628·cs.LG·February 15, 2022·26 cites

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration

Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, Dileep Kalathil,, Srinivas Shakkottai

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces LOGO, an algorithm that leverages offline demonstration data to improve reinforcement learning efficiency in environments with sparse rewards, enabling faster learning and better performance.

Contribution

The paper proposes the LOGO algorithm that guides online RL using offline demonstration data without imitation, and extends it to handle censored observations, with theoretical analysis and practical validation.

Findings

01

LOGO outperforms state-of-the-art methods on benchmark environments.

02

LOGO achieves superior trajectory tracking and obstacle avoidance in robotic experiments.

03

Theoretical lower bounds on performance improvement are established.

Abstract

A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

desikrengarajan/logo
pytorchOfficial

Videos

Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration· slideslive

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems