Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft
Chalamalasetti Kranti, Sherzod Hakimov, David Schlangen

TL;DR
This paper explores using large language models with few-shot prompting to predict builder actions in Minecraft collaborative tasks, demonstrating improved performance and analyzing current limitations.
Contribution
It introduces a retrieval-augmented approach with LLMs for action prediction in Minecraft, highlighting the effectiveness of few-shot prompting and providing insights into performance gaps.
Findings
Few-shot prompting significantly improves prediction accuracy.
Analysis reveals key performance gaps for future research.
Demonstrates LLMs' potential in situated action generation.
Abstract
In the Minecraft Collaborative Building Task, two players collaborate: an Architect (A) provides instructions to a Builder (B) to assemble a specified structure using 3D blocks. In this work, we investigate the use of large language models (LLMs) to predict the sequence of actions taken by the Builder. Leveraging LLMs' in-context learning abilities, we use few-shot prompting techniques, that significantly improve performance over baseline methods. Additionally, we present a detailed analysis of the gaps in performance for future work
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Human Motion and Animation · Video Analysis and Summarization
