MindForge: Empowering Embodied Agents with Theory of Mind for Lifelong Cultural Learning
Mircea Lic\u{a}, Ojas Shirekar, Baptiste Colle, Chirag Raman

TL;DR
MindForge introduces a framework for embodied agents that use a structured theory of mind, natural communication, and memory systems to enhance lifelong cultural learning and collaboration in Minecraft, outperforming previous models.
Contribution
The paper presents a novel generative-agent framework with explicit perspective taking, linking percepts, beliefs, desires, and actions for improved cultural lifelong learning.
Findings
MindForge agents outperform Voyager in basic tasks by 3x in tech milestones.
Agents collect 2.3x more unique items than Voyager.
Communication rounds improve collaborative performance, aligning with the Condorcet Jury Theorem.
Abstract
Embodied agents powered by large language models (LLMs), such as Voyager, promise open-ended competence in worlds such as Minecraft. However, when powered by open-weight LLMs they still falter on elementary tasks after domain-specific fine-tuning. We propose MindForge, a generative-agent framework for cultural lifelong learning through explicit perspective taking. We introduce three key innovations: (1) a structured theory of mind representation linking percepts, beliefs, desires, and actions; (2) natural inter-agent communication; and (3) a multi-component memory system. Following the cultural learning framework, we test MindForge in both instructive and collaborative settings within Minecraft. In an instructive setting with GPT-4, MindForge agents powered by open-weight LLMs significantly outperform their Voyager counterparts in basic tasks yielding more tech-tree milestones…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEmbodied and Extended Cognition
