Policy Improvement using Language Feedback Models
Victor Zhong, Dipendra Misra, Xingdi Yuan, Marc-Alexandre C\^ot\'e

TL;DR
This paper presents Language Feedback Models (LFMs) that leverage large language models to identify and improve desirable behaviors in imitation learning for instruction-following tasks, leading to better task completion and generalization.
Contribution
The paper introduces LFMs that use LLM-generated feedback to enhance imitation learning, outperforming direct LLM action prediction and enabling human-interpretable feedback.
Findings
Improved task completion rates on three environments.
LFMs outperform direct LLM action prediction.
Enhanced generalization to unseen environments.
Abstract
We introduce Language Feedback Models (LFMs) that identify desirable behaviour - actions that help achieve tasks specified in the instruction - for imitation learning in instruction following. To train LFMs, we obtain feedback from Large Language Models (LLMs) on visual trajectories verbalized to language descriptions. First, by using LFMs to identify desirable behaviour to imitate, we improve in task-completion rate over strong behavioural cloning baselines on three distinct language grounding environments (Touchdown, ScienceWorld, and ALFWorld). Second, LFMs outperform using LLMs as experts to directly predict actions, when controlling for the number of LLM output tokens. Third, LFMs generalize to unseen environments, improving task-completion rate by 3.5-12.0% through one round of adaptation. Finally, LFM can be modified to provide human-interpretable feedback without performance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsEconomic Policies and Impacts
