Optimus-1: Hybrid Multimodal Memory Empowered Agents Excel in Long-Horizon Tasks
Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Dongmei Jiang, Liqiang, Nie

TL;DR
Optimus-1 introduces a hybrid multimodal memory system that enhances agent capabilities in long-horizon tasks by explicitly representing world knowledge and summarizing multimodal experiences, leading to significant performance improvements.
Contribution
The paper presents a novel Hybrid Multimodal Memory module that combines a Hierarchical Directed Knowledge Graph and an Abstracted Multimodal Experience Pool to improve long-horizon task performance.
Findings
Optimus-1 outperforms existing agents on long-horizon benchmarks.
Achieves near human-level performance in Minecraft tasks.
Demonstrates strong generalization with MLLMs as backbone.
Abstract
Building a general-purpose agent is a long-standing vision in the field of artificial intelligence. Existing agents have made remarkable progress in many domains, yet they still struggle to complete long-horizon tasks in an open world. We attribute this to the lack of necessary world knowledge and multimodal experience that can guide agents through a variety of long-horizon tasks. In this paper, we propose a Hybrid Multimodal Memory module to address the above challenges. It 1) transforms knowledge into Hierarchical Directed Knowledge Graph that allows agents to explicitly represent and learn world knowledge, and 2) summarises historical information into Abstracted Multimodal Experience Pool that provide agents with rich references for in-context learning. On top of the Hybrid Multimodal Memory module, a multimodal agent, Optimus-1, is constructed with dedicated Knowledge-guided Planner…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsFuzzy Logic and Control Systems · Multi-Agent Systems and Negotiation · AI-based Problem Solving and Planning
