TL;DR
This paper introduces GILD, a meta-learned objective that leverages offline data to improve online reinforcement learning, especially in sparse reward settings, across various algorithms with minimal additional cost.
Contribution
The paper proposes GILD, a flexible, hyperparameter-free meta-learning framework that enhances diverse RL algorithms using offline demonstration data.
Findings
GILD significantly improves performance in MuJoCo tasks with sparse rewards.
Enhanced RL algorithms outperform state-of-the-art methods.
GILD introduces minimal computational overhead.
Abstract
A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
