BAP v2: An Enhanced Task Framework for Instruction Following in Minecraft Dialogues
Prashant Jayannavar, Liliang Ren, Marisa Hudspeth, Risham Sidhu, Charlotte Lambert, Ariel Cordes, Elizabeth Kaplan, Anjali Narayan-Chen, Julia Hockenmaier

TL;DR
This paper introduces BAP v2, an improved framework for instruction following in Minecraft dialogues, featuring enhanced evaluation, synthetic data generation, and a new state-of-the-art model to better assess spatial reasoning in AI agents.
Contribution
The paper presents BAP v2 with a refined benchmark, synthetic data for spatial skill training, and a new model, Llama-CRAFTS, achieving improved performance on grounded instruction tasks.
Findings
Synthetic data improves model performance.
Current models struggle with spatial reasoning.
BAP v2 provides a challenging benchmark for future research.
Abstract
Developing interactive agents that can understand language, perceive their surroundings, and act within the physical world is a long-standing goal of AI research. The Minecraft Collaborative Building Task (MCBT) (Narayan-Chen, Jayannavar, and Hockenmaier 2019), a two-player game in which an Architect (A) instructs a Builder (B) to construct a target structure in a simulated 3D Blocks World environment, offers a rich platform to work towards this goal. In this work, we focus on the Builder Action Prediction (BAP) subtask: predicting B's actions in a multimodal game context (Jayannavar, Narayan-Chen, and Hockenmaier 2020) - a challenging testbed for grounded instruction following, with limited training data. We holistically re-examine this task and introduce BAP v2 to address key challenges in evaluation, training data, and modeling. Specifically, we define an enhanced evaluation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Intelligent Tutoring Systems and Adaptive Learning · Educational Tools and Methods
MethodsSparse Evolutionary Training · Focus
