MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents

Tamil Sudaravan Mohan Doss; Michael Xu; Sudha Rao; Andrew D. Wilson; Balasaravanan Thoravi Kumaravel

arXiv:2601.05215·cs.AI·January 12, 2026

MineNPC-Task: Task Suite for Memory-Aware Minecraft Agents

Tamil Sudaravan Mohan Doss, Michael Xu, Sudha Rao, Andrew D. Wilson, Balasaravanan Thoravi Kumaravel

PDF

Open Access

TL;DR

This paper introduces MineNPC-Task, a comprehensive benchmark for evaluating memory-aware Minecraft agents through real co-play tasks, emphasizing transparency, reproducibility, and in-world evidence-based assessment.

Contribution

It presents a novel, human-authored task suite with explicit dependencies and validators, enabling realistic and reproducible evaluation of memory and planning in Minecraft agents.

Findings

01

Identified common failure patterns in code, inventory, and navigation.

02

Demonstrated successful recovery through clarifications and memory use.

03

Participants rated interaction quality positively.

Abstract

We present MineNPC-Task, a user-authored benchmark and evaluation harness for testing memory-aware, mixed-initiative LLM agents in open-world Minecraft. Rather than relying on synthetic prompts, tasks are elicited through formative and summative co-play with expert players, then normalized into parametric templates with explicit preconditions and dependency structure. These tasks are paired with machine-checkable validators under a bounded-knowledge policy that forbids out-of-world shortcuts. The harness captures plan, action, and memory events, including plan previews, targeted clarifications, memory reads and writes, precondition checks, and repair attempts, and reports outcomes relative to the total number of attempted subtasks using only in-world evidence. As an initial snapshot, we instantiate the framework with GPT-4o and evaluate 216 subtasks across 8 experienced players. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Games · Multimodal Machine Learning Applications · Multi-Agent Systems and Negotiation