Reinforcement Learning with Sparse Rewards using Guidance from Offline Demonstration
Desik Rengarajan, Gargi Vaidya, Akshay Sarvesh, Dileep Kalathil,, Srinivas Shakkottai

TL;DR
This paper introduces LOGO, an algorithm that leverages offline demonstration data to improve reinforcement learning efficiency in environments with sparse rewards, enabling faster learning and better performance.
Contribution
The paper proposes the LOGO algorithm that guides online RL using offline demonstration data without imitation, and extends it to handle censored observations, with theoretical analysis and practical validation.
Findings
LOGO outperforms state-of-the-art methods on benchmark environments.
LOGO achieves superior trajectory tracking and obstacle avoidance in robotic experiments.
Theoretical lower bounds on performance improvement are established.
Abstract
A major challenge in real-world reinforcement learning (RL) is the sparsity of reward feedback. Often, what is available is an intuitive but sparse reward function that only indicates whether the task is completed partially or fully. However, the lack of carefully designed, fine grain feedback implies that most existing RL algorithms fail to learn an acceptable policy in a reasonable time frame. This is because of the large number of exploration actions that the policy has to perform before it gets any useful feedback that it can learn from. In this work, we address this challenging problem by developing an algorithm that exploits the offline demonstration data generated by a sub-optimal behavior policy for faster and efficient online RL in such sparse reward settings. The proposed algorithm, which we call the Learning Online with Guidance Offline (LOGO) algorithm, merges a policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Optimization and Search Problems
