MCU: An Evaluation Framework for Open-Ended Game Agents

Xinyue Zheng; Haowei Lin; Kaichen He; Zihao Wang; Zilong Zheng; Yitao Liang

arXiv:2310.08367·cs.AI·June 4, 2025·1 cites

MCU: An Evaluation Framework for Open-Ended Game Agents

Xinyue Zheng, Haowei Lin, Kaichen He, Zihao Wang, Zilong Zheng, Yitao Liang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces MCU, a comprehensive evaluation framework within Minecraft that assesses open-ended AI agents across thousands of diverse, composable tasks, revealing current agents' struggles with complexity and diversity.

Contribution

The paper presents MCU, a scalable, diverse, and human-aligned evaluation framework for open-ended game agents in Minecraft, addressing limitations of existing benchmarks.

Findings

01

State-of-the-art agents struggle with complex, diverse tasks.

02

MCU's evaluation aligns 91.5% with human ratings.

03

The framework enables infinite task generation with varying difficulty.

Abstract

Developing AI agents capable of interacting with open-world environments to solve diverse tasks is a compelling challenge. However, evaluating such open-ended agents remains difficult, with current benchmarks facing scalability limitations. To address this, we introduce Minecraft Universe (MCU), a comprehensive evaluation framework set within the open-world video game Minecraft. MCU incorporates three key components: (1) an expanding collection of 3,452 composable atomic tasks that encompasses 11 major categories and 41 subcategories of challenges; (2) a task composition mechanism capable of generating infinite diverse tasks with varying difficulty; and (3) a general evaluation framework that achieves 91.5\% alignment with human ratings for open-ended task assessment. Empirical results reveal that even state-of-the-art foundation agents struggle with the increasing diversity and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

craftjarvis/mcu
pytorchOfficial

Videos

MCU: An Evaluation Framework for Open-Ended Game Agents· slideslive

Taxonomy

TopicsArtificial Intelligence in Games · Multi-Agent Systems and Negotiation · AI-based Problem Solving and Planning