The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data
Thomas Pouplin, Katarzyna Kobalczyk, Hao Sun, Mihaela van der Schaar

TL;DR
This paper presents TEDUO, a novel offline training pipeline that leverages large language models to enable data-efficient, generalizable language-conditioned policies in symbolic environments, overcoming limitations of traditional RL methods.
Contribution
TEDUO introduces a dual-role use of LLMs for dataset augmentation and instruction following, improving offline policy learning in low-data, complex environments.
Findings
Achieves robust, generalizable policies with limited data
Outperforms traditional RL and standalone LLMs in complex tasks
Demonstrates effective offline learning in symbolic environments
Abstract
Developing autonomous agents capable of performing complex, multi-step decision-making tasks specified in natural language remains a significant challenge, particularly in realistic settings where labeled data is scarce and real-time experimentation is impractical. Existing reinforcement learning (RL) approaches often struggle to generalize to unseen goals and states, limiting their applicability. In this paper, we introduce TEDUO, a novel training pipeline for offline language-conditioned policy learning in symbolic environments. Unlike conventional methods, TEDUO operates on readily available, unlabeled datasets and addresses the challenge of generalization to previously unseen goals and states. Our approach harnesses large language models (LLMs) in a dual capacity: first, as automatization tools augmenting offline datasets with richer annotations, and second, as generalizable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
