Optimus-3: Dual-Router Aligned Mixture-of-Experts Agent with Dual-Granularity Reasoning-Aware Policy Optimization

Zaijing Li; Yuquan Xie; Rui Shao; Gongwei Chen; Weili Guan; Dongmei Jiang; Yaowei Wang; Liqiang Nie

arXiv:2506.10357·cs.AI·February 11, 2026

Optimus-3: Dual-Router Aligned Mixture-of-Experts Agent with Dual-Granularity Reasoning-Aware Policy Optimization

Zaijing Li, Yuquan Xie, Rui Shao, Gongwei Chen, Weili Guan, Dongmei Jiang, Yaowei Wang, Liqiang Nie

PDF

Open Access 3 Models

TL;DR

Optimus-3 is a unified embodied AI agent integrating reflexive and deliberative reasoning, utilizing a novel data generation pipeline, dual-router architecture, and a reasoning-aware policy optimization to excel in complex Minecraft tasks.

Contribution

The paper introduces a comprehensive framework combining data synthesis, dual-router architecture, and a new training algorithm to unify System 1 and System 2 reasoning in embodied AI agents.

Findings

01

Outperforms state-of-the-art on multiple System 2 tasks by up to 76%.

02

Achieves 60% success rate on open-ended tasks.

03

Demonstrates effective integration of dual reasoning systems in complex environments.

Abstract

Developing generalist agents capable of solving open-ended tasks in visually rich, dynamic environments remains a core pursuit of embodied AI. While Minecraft has emerged as a compelling benchmark, existing agents often suffer from fragmented cognitive abilities, lacking the synergy between reflexive execution (System 1) and deliberative reasoning (System 2). In this paper, we introduce Optimus-3, a generalist agent that organically integrates these dual capabilities within a unified framework. To achieve this, we address three fundamental challenges. First, to overcome the scarcity of reasoning data, we propose a Knowledge-Enhanced Automated Data Generation Pipeline. It synthesizes high-quality System 2 reasoning traces from raw System 1 interaction trajectories, effectively mitigating hallucinations via injection of domain knowledge. We release the resulting dataset,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning