Open-Ended Instructable Embodied Agents with Memory-Augmented Large   Language Models

Gabriel Sarch; Yue Wu; Michael J. Tarr; Katerina Fragkiadaki

arXiv:2310.15127·cs.AI·November 21, 2023·1 cites

Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models

Gabriel Sarch, Yue Wu, Michael J. Tarr, Katerina Fragkiadaki

PDF

Open Access

TL;DR

HELPER is an embodied agent that uses memory-augmented large language models to interpret natural language instructions and adapt to individual user routines, achieving state-of-the-art results in robot task benchmarks.

Contribution

This paper introduces HELPER, a novel memory-augmented LLM framework that personalizes robot instruction parsing through retrieval of dialogue-based memories, improving performance on complex benchmarks.

Findings

01

Achieved 1.7x improvement on TfD benchmark

02

Set new state-of-the-art in TEACh benchmark tasks

03

Effectively personalizes robot instructions using external memory

Abstract

Pre-trained and frozen large language models (LLMs) can effectively map simple scene rearrangement instructions to programs over a robot's visuomotor functions through appropriate few-shot example prompting. To parse open-domain natural language and adapt to a user's idiosyncratic procedures, not known during prompt engineering time, fixed prompts fall short. In this paper, we introduce HELPER, an embodied agent equipped with an external memory of language-program pairs that parses free-form human-robot dialogue into action programs through retrieval-augmented LLM prompting: relevant memories are retrieved based on the current dialogue, instruction, correction, or VLM description, and used as in-context prompt examples for LLM querying. The memory is expanded during deployment to include pairs of user's language and action plans, to assist future inferences and personalize them to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Human Pose and Action Recognition