Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM

Thang Duong; Minglai Yang; Chicheng Zhang

arXiv:2505.10861·cs.LG·May 19, 2025

Improving the Data-efficiency of Reinforcement Learning by Warm-starting with LLM

Thang Duong, Minglai Yang, Chicheng Zhang

PDF

Open Access 1 Repo

TL;DR

This paper presents LORO, a method that leverages Large Language Models to generate high-quality initial datasets, significantly improving the sample efficiency and performance of reinforcement learning in classical MDP environments.

Contribution

The paper introduces LORO, a novel approach that combines LLM-generated data with RL to enhance learning efficiency and policy optimality.

Findings

01

LORO converges to optimal policies in tested environments.

02

LORO achieves up to 4 times higher cumulative rewards than baseline methods.

03

Using LLMs for data initialization improves RL sample efficiency.

Abstract

We investigate the usage of Large Language Model (LLM) in collecting high-quality data to warm-start Reinforcement Learning (RL) algorithms for learning in some classical Markov Decision Process (MDP) environments. In this work, we focus on using LLM to generate an off-policy dataset that sufficiently covers state-actions visited by optimal policies, then later using an RL algorithm to explore the environment and improve the policy suggested by the LLM. Our algorithm, LORO, can both converge to an optimal policy and have a high sample efficiency thanks to the LLM's good starting policy. On multiple OpenAI Gym environments, such as CartPole and Pendulum, we empirically demonstrate that LORO outperforms baseline algorithms such as pure LLM-based policies, pure RL, and a naive combination of the two, achieving up to $4 \times$ the cumulative rewards of the pure RL baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

duongnhatthang/llamagym
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Algorithms · Machine Learning and Data Classification

MethodsFocus