The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data

Thomas Pouplin; Katarzyna Kobalczyk; Hao Sun; Mihaela van der Schaar

arXiv:2412.06877·cs.CL·June 9, 2025

The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data

Thomas Pouplin, Katarzyna Kobalczyk, Hao Sun, Mihaela van der Schaar

PDF

Open Access

TL;DR

This paper presents TEDUO, a novel offline training pipeline that leverages large language models to enable data-efficient, generalizable language-conditioned policies in symbolic environments, overcoming limitations of traditional RL methods.

Contribution

TEDUO introduces a dual-role use of LLMs for dataset augmentation and instruction following, improving offline policy learning in low-data, complex environments.

Findings

01

Achieves robust, generalizable policies with limited data

02

Outperforms traditional RL and standalone LLMs in complex tasks

03

Demonstrates effective offline learning in symbolic environments

Abstract

Developing autonomous agents capable of performing complex, multi-step decision-making tasks specified in natural language remains a significant challenge, particularly in realistic settings where labeled data is scarce and real-time experimentation is impractical. Existing reinforcement learning (RL) approaches often struggle to generalize to unseen goals and states, limiting their applicability. In this paper, we introduce TEDUO, a novel training pipeline for offline language-conditioned policy learning in symbolic environments. Unlike conventional methods, TEDUO operates on readily available, unlabeled datasets and addresses the challenge of generalization to previously unseen goals and states. Our approach harnesses large language models (LLMs) in a dual capacity: first, as automatization tools augmenting offline datasets with richer annotations, and second, as generalizable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification