Using Offline Data to Speed Up Reinforcement Learning in Procedurally Generated Environments
Alain Andres, Lukas Sch\"afer, Stefano V.Albrecht, Javier Del Ser

TL;DR
This paper investigates how offline data, through imitation learning, can enhance sample efficiency in reinforcement learning within procedurally generated environments, demonstrating significant improvements with minimal offline trajectories.
Contribution
The study shows that offline imitation learning, used for pre-training or concurrently with online RL, significantly boosts sample efficiency and can achieve optimal policies with very few offline trajectories.
Findings
Offline IL improves sample efficiency in RL tasks.
Pre-training with just two trajectories can enable learning of optimal policies.
Concurrent IL during RL training consistently enhances convergence to optimal policies.
Abstract
One of the key challenges of Reinforcement Learning (RL) is the ability of agents to generalise their learned policy to unseen settings. Moreover, training RL agents requires large numbers of interactions with the environment. Motivated by the recent success of Offline RL and Imitation Learning (IL), we conduct a study to investigate whether agents can leverage offline data in the form of trajectories to improve the sample-efficiency in procedurally generated environments. We consider two settings of using IL from offline data for RL: (1) pre-training a policy before online RL training and (2) concurrently training a policy with online RL and IL from offline data. We analyse the impact of the quality (optimality of trajectories) and diversity (number of trajectories and covered level) of available offline trajectories on the effectiveness of both approaches. Across four well-known…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Smart Grid Energy Management · Auction Theory and Applications
