Retrieval-Augmented Code Generation for Situated Action Generation: A   Case Study on Minecraft

Chalamalasetti Kranti; Sherzod Hakimov; David Schlangen

arXiv:2406.17553·cs.CL·June 26, 2024

Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft

Chalamalasetti Kranti, Sherzod Hakimov, David Schlangen

PDF

Open Access

TL;DR

This paper explores using large language models with few-shot prompting to predict builder actions in Minecraft collaborative tasks, demonstrating improved performance and analyzing current limitations.

Contribution

It introduces a retrieval-augmented approach with LLMs for action prediction in Minecraft, highlighting the effectiveness of few-shot prompting and providing insights into performance gaps.

Findings

01

Few-shot prompting significantly improves prediction accuracy.

02

Analysis reveals key performance gaps for future research.

03

Demonstrates LLMs' potential in situated action generation.

Abstract

In the Minecraft Collaborative Building Task, two players collaborate: an Architect (A) provides instructions to a Builder (B) to assemble a specified structure using 3D blocks. In this work, we investigate the use of large language models (LLMs) to predict the sequence of actions taken by the Builder. Leveraging LLMs' in-context learning abilities, we use few-shot prompting techniques, that significantly improve performance over baseline methods. Additionally, we present a detailed analysis of the gaps in performance for future work

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Human Motion and Animation · Video Analysis and Summarization