Policy Improvement using Language Feedback Models

Victor Zhong; Dipendra Misra; Xingdi Yuan; Marc-Alexandre C\^ot\'e

arXiv:2402.07876·cs.LG·October 11, 2024·1 cites

Policy Improvement using Language Feedback Models

Victor Zhong, Dipendra Misra, Xingdi Yuan, Marc-Alexandre C\^ot\'e

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper presents Language Feedback Models (LFMs) that leverage large language models to identify and improve desirable behaviors in imitation learning for instruction-following tasks, leading to better task completion and generalization.

Contribution

The paper introduces LFMs that use LLM-generated feedback to enhance imitation learning, outperforming direct LLM action prediction and enabling human-interpretable feedback.

Findings

01

Improved task completion rates on three environments.

02

LFMs outperform direct LLM action prediction.

03

Enhanced generalization to unseen environments.

Abstract

We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions. First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. Third, LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation. Finally, LFM can be modified to provide human-interpretable feedback without performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vzhong/language_feedback_models
pytorchOfficial

Videos

Policy Improvement using Language Feedback Models· slideslive

Taxonomy

TopicsEconomic Policies and Impacts