Language Models are Few-Shot Butlers

Vincent Micheli; Fran\c{c}ois Fleuret

arXiv:2104.07972·cs.CL·September 21, 2021

Language Models are Few-Shot Butlers

Vincent Micheli, Fran\c{c}ois Fleuret

PDF

1 Repo 1 Models

TL;DR

This paper presents a method for improving language models' performance in text-based environments by fine-tuning with minimal expert demonstrations and reinforcement learning, achieving significant success rate improvements.

Contribution

It introduces a two-stage learning procedure that combines limited demonstrations with environment interaction to enhance language model capabilities.

Findings

01

51% success rate improvement in ALFWorld environment

02

Effective learning from only 1.2% of expert demonstrations

03

Combines fine-tuning with reinforcement learning for better performance

Abstract

Pretrained language models demonstrate strong performance in most NLP tasks when fine-tuned on small task-specific datasets. Hence, these autoregressive models constitute ideal agents to operate in text-based environments where language understanding and generative capabilities are essential. Nonetheless, collecting expert demonstrations in such environments is a time-consuming endeavour. We introduce a two-stage procedure to learn from a small set of demonstrations and further improve by interacting with an environment. We show that language models fine-tuned with only 1.2% of the expert demonstrations and a simple reinforcement learning algorithm achieve a 51% absolute improvement in success rate over existing methods in the ALFWorld environment.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vmicheli/lm-butlers
pytorchOfficial

Models

🤗
vmicheli/lm-butlers-gpt
model· 3 dl
3 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.