MineAnyBuild: Benchmarking Spatial Planning for Open-world AI Agents
Ziming Wei, Bingqian Lin, Zijian Jiao, Yunshuang Nie, Liang Ma, Yuecheng Liu, Yuzheng Zhuang, Xiaodan Liang

TL;DR
MineAnyBuild is a comprehensive benchmark designed to evaluate the spatial planning abilities of open-world AI agents in Minecraft, focusing on understanding, reasoning, creativity, and commonsense, revealing current limitations and future potential.
Contribution
The paper introduces MineAnyBuild, a novel, expandable benchmark for assessing spatial planning in AI agents within a complex, open-world environment like Minecraft.
Findings
Existing MLLM agents show significant limitations in spatial planning.
MineAnyBuild reveals the potential for improving AI spatial reasoning.
The benchmark supports large-scale, multi-modal, and diverse spatial tasks.
Abstract
Spatial Planning is a crucial part in the field of spatial intelligence, which requires the understanding and planning about object arrangements in space perspective. AI agents with the spatial planning ability can better adapt to various real-world applications, including robotic manipulation, automatic assembly, urban planning etc. Recent works have attempted to construct benchmarks for evaluating the spatial intelligence of Multimodal Large Language Models (MLLMs). Nevertheless, these benchmarks primarily focus on spatial reasoning based on typical Visual Question-Answering (VQA) forms, which suffers from the gap between abstract spatial understanding and concrete task execution. In this work, we take a step further to build a comprehensive benchmark called MineAnyBuild, aiming to evaluate the spatial planning ability of open-world AI agents in the Minecraft game. Specifically,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMulti-Agent Systems and Negotiation
MethodsFocus
