Odyssey: Empowering Minecraft Agents with Open-World Skills

Shunyu Liu; Yaoru Li; Kongcheng Zhang; Zhenyu Cui; Wenkai Fang; Yuxuan Zheng; Tongya Zheng; Mingli Song

arXiv:2407.15325·cs.AI·June 3, 2025

Odyssey: Empowering Minecraft Agents with Open-World Skills

Shunyu Liu, Yaoru Li, Kongcheng Zhang, Zhenyu Cui, Wenkai Fang, Yuxuan Zheng, Tongya Zheng, Mingli Song

PDF

Open Access 1 Repo 4 Reviews

TL;DR

Odyssey introduces an LLM-based agent framework for Minecraft that enhances open-world exploration and long-term planning through a rich skill library and specialized training, advancing autonomous agent capabilities.

Contribution

The paper presents Odyssey, a novel framework combining a large skill library, fine-tuned LLaMA-3, and new benchmarks to improve open-world skills in Minecraft agents.

Findings

01

Effective evaluation of LLM-based agent capabilities

02

Enhanced autonomous exploration in Minecraft

03

Improved long-term planning performance

Abstract

Recent studies have delved into constructing generalist agents for open-world environments like Minecraft. Despite the encouraging results, existing efforts mainly focus on solving basic programmatic tasks, e.g., material collection and tool-crafting following the Minecraft tech-tree, treating the ObtainDiamond task as the ultimate goal. This limitation stems from the narrowly defined set of actions available to agents, requiring them to learn effective long-horizon strategies from scratch. Consequently, discovering diverse gameplay opportunities in the open world becomes challenging. In this work, we introduce Odyssey, a new framework that empowers Large Language Model (LLM)-based agents with open-world skills to explore the vast Minecraft world. Odyssey comprises three key parts: (1) An interactive agent with an open-world skill library that consists of 40 primitive skills and 183…

Peer Reviews

Decision·ICLR 2025 Conference Withdrawn Submission

Reviewer 01Rating 3Confidence 3

Strengths

1. This paper demonstrates substantial effort, including the collection of Minecraft-specific data, fine-tuning a large language model, building a Minecraft agent, comparing it with numerous baselines, and designing three evaluation benchmarks. 2. The paper is well-formatted, with clear and coherent expression of ideas, making it easy for readers to follow and understand.

Weaknesses

I strongly agree with the paper’s critique that “current research in Minecraft is overly focused on tasks like mining diamonds.” Minecraft is indeed a valuable platform for studying generalist agents, as it simulates numerous real-world challenges such as complex perception, an infinite task space, partial observability, and intricate terrains—all unsolved issues. Developing agents in Minecraft should ideally contribute towards generalization in other environments, even the real world. However,

Reviewer 02Rating 5Confidence 4

Strengths

+Overall the paper is clearly written, the graphics are stylish and the write-up is good. +The research topic (open-world agents, LLMs, etc) is relevant to the interest of NeurIPS community. +The proposed benchmark is interesting and somewhat comprehensive in terms of the diversity and complexity of tasks and the open-world capabilities that can be evaluated.

Weaknesses

-The contributions, though they require a considerable amount of work, do not constitute the significance needed by a conference paper of a top-tier conference like ICLR. Indeed I found the three pillars: the primitive skill library, the LLM for Minecraft QA, and the benchmark are loosely connected and it is unclear how they can benefit better open-world Minecraft agents as a whole. More importantly, it does not look obvious to me how can these pillars be distinguished from several prior works

Reviewer 03Rating 3Confidence 5

Strengths

1. The visual illustrations are appealing and elaborate. 2. The appendix provides a thorough and detailed explanation of the methods.

Weaknesses

1. ODYSSEY’s pipeline is highly similar to existing frameworks such as Voyager, Optimus-1[1], and ADAM[2]. 2. ODYSSEY relies on predefined primitive skills, which were generated by GPT-4, whereas GPT-4 itself can directly write JavaScript programs based on Mineflayer. This approach of relying on primitive skills limits the agent’s ability to perform more complex and open-ended tasks, such as building. 3. On programmatic tasks, ODYSSEY does not demonstrate a broader task range compared to baselin

Reviewer 04Rating 6Confidence 5

Strengths

- The paper is polished and well-written. - Experiments and analyses of results are thorough. Models that are trained and evaluated using the proposed framework are compared against relevant baselines. - The code released by the authors is clean and easy to use. - The performance of LMs under agentic frameworks like Voyager, which prompt models to generate skill libraries as code from scratch, depends strongly on the ability of the base model to generate quality code. In contrast, the Odyssey f

Weaknesses

- The proposed framework has limited novelty. Decomposing complex decision-making tasks with hand-engineered skill libraries has a very long history in robotics [1,2]. - The Odyssey framework is designed specifically for Minecraft. Agentic performance is significantly boosted through the careful design of useful, hand-engineered low-level skills. As a result, it is unclear to what extent good LM performance on Minecraft with Odyssey would transfer to other, more practical open-world environment

Code & Models

Repositories

zju-vipa/odyssey
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Mobile Agent-Based Network Management · Robotic Path Planning Algorithms

MethodsSparse Evolutionary Training · Focus · Lib