MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang,, Haoyi Zhu, Andrew Tang, De-An Huang, Yuke Zhu, Anima Anandkumar

TL;DR
MineDojo introduces a comprehensive framework for building generalist embodied agents in Minecraft, integrating diverse tasks, large-scale multimodal knowledge, and a novel learning algorithm leveraging pre-trained models to achieve open-ended task solving.
Contribution
The paper presents MineDojo, a scalable environment and knowledge base, along with a new learning algorithm that enables agents to solve diverse open-ended tasks without manual rewards.
Findings
Agents can solve various open-ended tasks in Minecraft.
Pre-trained video-language models effectively serve as reward functions.
Open-source resources facilitate further research in embodied AI.
Abstract
Autonomous agents have made great strides in specialist domains like Atari games and Go. However, they typically learn tabula rasa in isolated environments with limited and manually conceived objectives, thus failing to generalize across a wide spectrum of tasks and capabilities. Inspired by how humans continually learn and adapt in the open world, we advocate a trinity of ingredients for building generalist agents: 1) an environment that supports a multitude of tasks and goals, 2) a large-scale database of multimodal knowledge, and 3) a flexible and scalable agent architecture. We introduce MineDojo, a new framework built on the popular Minecraft game that features a simulation suite with thousands of diverse open-ended tasks and an internet-scale knowledge base with Minecraft videos, tutorials, wiki pages, and forum discussions. Using MineDojo's data, we propose a novel agent learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications
MethodsBalanced Selection
