True Knowledge Comes from Practice: Aligning LLMs with Embodied   Environments via Reinforcement Learning

Weihao Tan; Wentao Zhang; Shanqi Liu; Longtao Zheng; Xinrun Wang; Bo; An

arXiv:2401.14151·cs.LG·March 12, 2024·2 cites

True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning

Weihao Tan, Wentao Zhang, Shanqi Liu, Longtao Zheng, Xinrun Wang, Bo, An

PDF

Open Access 1 Repo

TL;DR

This paper introduces TWOSOME, a framework that uses large language models as decision-making agents in reinforcement learning environments, improving sample efficiency, generalization, and maintaining original capabilities.

Contribution

TWOSOME is a novel online framework that aligns LLMs with embodied environments using RL, featuring a parameter-efficient training architecture and enhanced policy stability.

Findings

01

TWOSOME outperforms PPO and SayCan in decision-making tasks.

02

It demonstrates superior generalization to unseen tasks.

03

It preserves LLMs' original abilities during training.

Abstract

Despite the impressive performance across numerous tasks, large language models (LLMs) often fail in solving simple decision-making tasks due to the misalignment of the knowledge in LLMs with environments. On the contrary, reinforcement learning (RL) agents learn policies from scratch, which makes them always align with environments but difficult to incorporate prior knowledge for efficient explorations. To narrow the gap, we propose TWOSOME, a novel general online framework that deploys LLMs as decision-making agents to efficiently interact and align with embodied environments via RL without requiring any prepared datasets or prior knowledge of the environments. Firstly, we query the joint probabilities of each valid action with LLMs to form behavior policies. Then, to enhance the stability and robustness of the policies, we propose two normalization methods and summarize four prompt…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weihaotan/twosome
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsEntropy Regularization · ALIGN · Proximal Policy Optimization