Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

Shengtian Yang; Yu Li; Shuo He; Yewen Li; Qingpeng Cai; Peng Jiang; Lei Feng

arXiv:2602.17038·cs.AI·May 20, 2026

Phase-Aware Mixture of Experts for Agentic Reinforcement Learning

Shengtian Yang, Yu Li, Shuo He, Yewen Li, Qingpeng Cai, Peng Jiang, Lei Feng

PDF

TL;DR

This paper introduces PA-MoE, a phase-aware mixture of experts architecture for reinforcement learning that maintains phase consistency in expert assignments, improving task specialization.

Contribution

It proposes a novel phase router that learns phase boundaries from RL objectives, enabling experts to preserve phase-specific knowledge.

Findings

01

PA-MoE outperforms traditional MoE in RL tasks

02

Phase-aware routing improves expert specialization

03

Experimental results validate the effectiveness of PA-MoE

Abstract

Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{single} policy network, causing \emph{simplicity bias} where simple tasks occupy most parameters and dominate gradient updates, leaving insufficient capacity for complex tasks. A plausible remedy could be employing the Mixture-of-Experts (MoE) architecture in the policy network, as MoE allows different parameters (experts) to specialize in different tasks, preventing simple tasks from dominating all parameters. However, a key limitation of traditional MoE is its token-level routing, where the router assigns each token to specialized experts, which fragments phase-consistent patterns into scattered expert assignments and thus undermines expert specialization. In this paper, we propose \textbf{Phase-Aware Mixture of Experts (PA-MoE)}. It first…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques