Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action   Conditioned Policy

Zaijing Li; Yuquan Xie; Rui Shao; Gongwei Chen; Dongmei Jiang; Liqiang; Nie

arXiv:2502.19902·cs.AI·March 12, 2025

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, Liqiang, Nie

PDF

Open Access 1 Datasets

TL;DR

Optimus-2 introduces a multimodal Minecraft agent that combines high-level planning with a goal-observation-action conditioned policy, utilizing a large dataset and novel modeling techniques to improve performance on diverse tasks.

Contribution

The paper presents a new multimodal Minecraft agent with a goal-observation-action conditioned policy and a large dataset, advancing open-world task learning.

Findings

01

Superior performance on atomic and long-horizon tasks

02

Effective modeling of causal relationships between observations and actions

03

Successful alignment of behavior tokens with language instructions

Abstract

Building an agent that can mimic human behavior patterns to accomplish various open-world tasks is a long-term goal. To enable agents to effectively learn behavioral patterns across diverse tasks, a key challenge lies in modeling the intricate relationships among observations, actions, and language. To this end, we propose Optimus-2, a novel Minecraft agent that incorporates a Multimodal Large Language Model (MLLM) for high-level planning, alongside a Goal-Observation-Action Conditioned Policy (GOAP) for low-level control. GOAP contains (1) an Action-guided Behavior Encoder that models causal relationships between observations and actions at each timestep, then dynamically interacts with the historical observation-action sequence, consolidating it into fixed-length behavior tokens, and (2) an MLLM that aligns behavior tokens with open-ended language instructions to predict actions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

MinecraftOptimus/MGOA
dataset· 220 dl
220 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Machine Learning in Healthcare