Learning to Learn Faster from Human Feedback with Language Model   Predictive Control

Jacky Liang; Fei Xia; Wenhao Yu; Andy Zeng; Montserrat Gonzalez; Arenas; Maria Attarian; Maria Bauza; Matthew Bennice; Alex Bewley; Adil; Dostmohamed; Chuyuan Kelly Fu; Nimrod Gileadi; Marissa Giustina; Keerthana; Gopalakrishnan; Leonard Hasenclever; Jan Humplik; Jasmine Hsu; Nikhil Joshi,; Ben Jyenis; Chase Kew; Sean Kirmani; Tsang-Wei Edward Lee; Kuang-Huei Lee,; Assaf Hurwitz Michaely; Joss Moore; Ken Oslund; Dushyant Rao; Allen Ren,; Baruch Tabanpour; Quan Vuong; Ayzaan Wahid; Ted Xiao; Ying Xu; Vincent; Zhuang; Peng Xu; Erik Frey; Ken Caluwaerts; Tingnan Zhang; Brian Ichter,; Jonathan Tompson; Leila Takayama; Vincent Vanhoucke; Izhak Shafran; Maja; Mataric; Dorsa Sadigh; Nicolas Heess; Kanishka Rao; Nik Stewart; Jie Tan,; Carolina Parada

arXiv:2402.11450·cs.RO·June 3, 2024·1 cites

Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Jacky Liang, Fei Xia, Wenhao Yu, Andy Zeng, Montserrat Gonzalez, Arenas, Maria Attarian, Maria Bauza, Matthew Bennice, Alex Bewley, Adil, Dostmohamed, Chuyuan Kelly Fu, Nimrod Gileadi, Marissa Giustina, Keerthana, Gopalakrishnan, Leonard Hasenclever, Jan Humplik, Jasmine Hsu

PDF

Open Access

TL;DR

This paper introduces Language Model Predictive Control (LMPC), a fine-tuning approach for LLMs that enhances robot code learning from human feedback by modeling interactions as a decision process, leading to improved teachability and success rates.

Contribution

The paper proposes LMPC, combining fine-tuned LLMs with model predictive control to better remember interactions and improve robot teaching efficiency across multiple tasks and embodiments.

Findings

01

26.9% increase in success rate for unseen tasks

02

Reduced human corrections from 2.4 to 1.9 on average

03

31.5% improvement in in-context learning success rate

Abstract

Large language models (LLMs) have been shown to exhibit a wide range of capabilities, such as writing robot code from language commands -- enabling non-experts to direct robot behaviors, modify them based on feedback, or compose them to perform new tasks. However, these capabilities (driven by in-context learning) are limited to short-term interactions, where users' feedback remains relevant for only as long as it fits within the context size of the LLM, and can be forgotten over longer interactions. In this work, we investigate fine-tuning the robot code-writing LLMs, to remember their in-context interactions and improve their teachability i.e., how efficiently they adapt to human inputs (measured by average number of corrections before the user considers the task successful). Our key observation is that when human-robot interactions are viewed as a partially observable Markov decision…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning