Phase-Aware Mixture of Experts for Agentic Reinforcement Learning
Shengtian Yang, Yu Li, Shuo He, Yewen Li, Qingpeng Cai, Peng Jiang, Lei Feng

TL;DR
This paper introduces PA-MoE, a phase-aware mixture of experts architecture for reinforcement learning that maintains phase consistency in expert assignments, improving task specialization.
Contribution
It proposes a novel phase router that learns phase boundaries from RL objectives, enabling experts to preserve phase-specific knowledge.
Findings
PA-MoE outperforms traditional MoE in RL tasks
Phase-aware routing improves expert specialization
Experimental results validate the effectiveness of PA-MoE
Abstract
Reinforcement learning (RL) has equipped LLM agents with a strong ability to solve complex tasks. However, existing RL methods normally use a \emph{single} policy network, causing \emph{simplicity bias} where simple tasks occupy most parameters and dominate gradient updates, leaving insufficient capacity for complex tasks. A plausible remedy could be employing the Mixture-of-Experts (MoE) architecture in the policy network, as MoE allows different parameters (experts) to specialize in different tasks, preventing simple tasks from dominating all parameters. However, a key limitation of traditional MoE is its token-level routing, where the router assigns each token to specialized experts, which fragments phase-consistent patterns into scattered expert assignments and thus undermines expert specialization. In this paper, we propose \textbf{Phase-Aware Mixture of Experts (PA-MoE)}. It first…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
