Enhancing Online Reinforcement Learning with Meta-Learned Objective from   Offline Data

Shilong Deng; Zetao Zheng; Hongcai He; Paul Weng; Jie Shao

arXiv:2501.07346·cs.LG·January 14, 2025

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data

Shilong Deng, Zetao Zheng, Hongcai He, Paul Weng, Jie Shao

PDF

1 Repo 1 Video

TL;DR

This paper introduces GILD, a meta-learned objective that leverages offline data to improve online reinforcement learning, especially in sparse reward settings, across various algorithms with minimal additional cost.

Contribution

The paper proposes GILD, a flexible, hyperparameter-free meta-learning framework that enhances diverse RL algorithms using offline demonstration data.

Findings

01

GILD significantly improves performance in MuJoCo tasks with sparse rewards.

02

Enhanced RL algorithms outperform state-of-the-art methods.

03

GILD introduces minimal computational overhead.

Abstract

A major challenge in Reinforcement Learning (RL) is the difficulty of learning an optimal policy from sparse rewards. Prior works enhance online RL with conventional Imitation Learning (IL) via a handcrafted auxiliary objective, at the cost of restricting the RL policy to be sub-optimal when the offline data is generated by a non-expert policy. Instead, to better leverage valuable information in offline data, we develop Generalized Imitation Learning from Demonstration (GILD), which meta-learns an objective that distills knowledge from offline data and instills intrinsic motivation towards the optimal policy. Distinct from prior works that are exclusive to a specific RL algorithm, GILD is a flexible module intended for diverse vanilla off-policy RL algorithms. In addition, GILD introduces no domain-specific hyperparameter and minimal increase in computational cost. In four challenging…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sldeng1003/gild
pytorchOfficial

Videos

Enhancing Online Reinforcement Learning with Meta-Learned Objective from Offline Data· underline