Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Ruizhe Shi, Yuyao Liu, Yanjie Ze, Simon S. Du, Huazhe Xu

TL;DR
This paper introduces LaMo, a framework that leverages pre-trained language models for offline reinforcement learning, improving performance especially with limited data by combining language and decision transformer techniques.
Contribution
LaMo is the first framework to adapt pre-trained language models for offline RL, utilizing LoRA fine-tuning and auxiliary language loss to enhance learning from limited data.
Findings
LaMo outperforms existing offline RL methods in sparse-reward tasks.
LaMo narrows the performance gap in dense-reward tasks.
LaMo is especially effective with limited data samples.
Abstract
Offline reinforcement learning (RL) aims to find a near-optimal policy using pre-collected datasets. In real-world scenarios, data collection could be costly and risky; therefore, offline RL becomes particularly challenging when the in-domain data is limited. Given recent advances in Large Language Models (LLMs) and their few-shot learning prowess, this paper introduces nguage Models for tion Control (), a general framework based on Decision Transformers to effectively use pre-trained Language Models (LMs) for offline RL. Our framework highlights four crucial components: (1) Initializing Decision Transformers with sequentially pre-trained LMs, (2) employing the LoRA fine-tuning method, in contrast to full-weight fine-tuning, to combine the pre-trained knowledge from LMs and in-domain knowledge effectively, (3) using the non-linear MLP…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI)
