Efficient Offline Reinforcement Learning: First Imitate, then Improve

Adam Jelley; Trevor McInroe; Sam Devlin; Amos Storkey

arXiv:2406.13376·cs.LG·December 30, 2025

Efficient Offline Reinforcement Learning: First Imitate, then Improve

Adam Jelley, Trevor McInroe, Sam Devlin, Amos Storkey

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hybrid offline reinforcement learning method that combines supervised pre-training with off-policy fine-tuning, resulting in faster and more stable training on standard benchmarks.

Contribution

It proposes a novel approach that pre-trains with supervised learning before applying off-policy reinforcement learning, enhancing efficiency and stability.

Findings

01

Significantly reduces training time of off-policy algorithms.

02

Achieves greater stability during training.

03

Improves performance on standard benchmarks.

Abstract

Supervised imitation-based approaches are often favored over off-policy reinforcement learning approaches for learning policies offline, since their straightforward optimization objective makes them computationally efficient and stable to train. However, their performance is fundamentally limited by the behavior policy that collected the dataset. Off-policy reinforcement learning provides a promising approach for improving on the behavior policy, but training is often computationally inefficient and unstable due to temporal-difference bootstrapping. In this paper, we propose a best-of-both approach by pre-training with supervised learning before improving performance with off-policy reinforcement learning. Specifically, we demonstrate improved efficiency by pre-training an actor with behavior cloning and a critic with a supervised Monte-Carlo value error. We find that we are able to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adamjelley/efficientofflinerl
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Behavioral and Psychological Studies