MineDreamer: Learning to Follow Instructions via Chain-of-Imagination   for Simulated-World Control

Enshen Zhou; Yiran Qin; Zhenfei Yin; Yuzhou Huang; Ruimao Zhang; Lu; Sheng; Yu Qiao; Jing Shao

arXiv:2403.12037·cs.CV·March 20, 2024·2 cites

MineDreamer: Learning to Follow Instructions via Chain-of-Imagination for Simulated-World Control

Enshen Zhou, Yiran Qin, Zhenfei Yin, Yuzhou Huang, Ruimao Zhang, Lu, Sheng, Yu Qiao, Jing Shao

PDF

Open Access 1 Repo 3 Models 1 Datasets

TL;DR

MineDreamer is an innovative embodied agent in Minecraft that uses a Chain-of-Imagination mechanism with multimodal models to follow complex instructions more accurately and reliably than previous methods.

Contribution

It introduces a novel Chain-of-Imagination approach combined with multimodal models to enhance instruction-following in simulated-world control tasks.

Findings

01

Significantly outperforms baseline agents in instruction-following accuracy.

02

Nearly doubles the performance of existing generalist agents.

03

Demonstrates strong generalization and understanding of the open world.

Abstract

It is a long-lasting goal to design a generalist-embodied agent that can follow diverse instructions in human-like ways. However, existing approaches often fail to steadily follow instructions due to difficulties in understanding abstract and sequential natural language instructions. To this end, we introduce MineDreamer, an open-ended embodied agent built upon the challenging Minecraft simulator with an innovative paradigm that enhances instruction-following ability in low-level control signal generation. Specifically, MineDreamer is developed on top of recent advances in Multimodal Large Language Models (MLLMs) and diffusion models, and we employ a Chain-of-Imagination (CoI) mechanism to envision the step-by-step process of executing instructions and translating imaginations into more precise visual prompts tailored to the current state; subsequently, the agent generates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Zhoues/MineDreamer
pytorchOfficial

Models

Datasets

Zhoues/Goal-Drift-Dataset
dataset· 15 dl
15 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Human Motion and Animation · Model Reduction and Neural Networks

MethodsDiffusion