OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied   Instruction Following

Haochen Shi; Zhiyuan Sun; Xingdi Yuan; Marc-Alexandre C\^ot\'e; Bang; Liu

arXiv:2403.03017·cs.AI·March 6, 2024·1 cites

OPEx: A Component-Wise Analysis of LLM-Centric Agents in Embodied Instruction Following

Haochen Shi, Zhiyuan Sun, Xingdi Yuan, Marc-Alexandre C\^ot\'e, Bang, Liu

PDF

Open Access

TL;DR

This paper introduces OPEx, a framework for analyzing LLM-centric agents in embodied instruction following, highlighting key components and demonstrating that multi-agent strategies significantly improve task performance.

Contribution

It provides a unified analysis of core components affecting EIF performance and proposes a multi-agent dialogue approach to enhance outcomes.

Findings

01

LLM-centric design improves EIF performance

02

Visual perception and action execution are bottlenecks

03

Multi-agent strategies further boost task success

Abstract

Embodied Instruction Following (EIF) is a crucial task in embodied learning, requiring agents to interact with their environment through egocentric observations to fulfill natural language instructions. Recent advancements have seen a surge in employing large language models (LLMs) within a framework-centric approach to enhance performance in embodied learning tasks, including EIF. Despite these efforts, there exists a lack of a unified understanding regarding the impact of various components-ranging from visual perception to action execution-on task performance. To address this gap, we introduce OPEx, a comprehensive framework that delineates the core components essential for solving embodied learning tasks: Observer, Planner, and Executor. Through extensive evaluations, we provide a deep analysis of how each component influences EIF task performance. Furthermore, we innovate within…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Human Pose and Action Recognition · Human Motion and Animation