SPHERE: Mitigating the Loss of Spectral Plasticity in Mixture-of-Experts for Deep Reinforcement Learning
Lirui Luo, Guoxi Zhang, Hongming Xu, Cong Fang, Qing Li

TL;DR
This paper introduces SPHERE, a regularization method based on spectral plasticity theory, to mitigate plasticity loss in Mixture-of-Experts policies for continual deep reinforcement learning, improving performance and preserving learning capacity.
Contribution
It formalizes spectral plasticity loss in MoE policies using NTK theory and proposes SPHERE, a practical penalty that enhances continual learning performance.
Findings
SPHERE improves success rates by 133% on MetaWorld.
SPHERE increases success by 50% on HumanoidBench.
SPHERE maintains higher spectral plasticity during training.
Abstract
In deep reinforcement learning (DRL), an agent is trained from a stream of experience. In a continual learning setting, such agents can suffer from plasticity loss: their ability to learn new skills from new experiences diminishes over training. Recently, Mixture-of-Experts (MoE) networks have been reported to enable scaling laws and facilitate the learning of diverse skills. However, in continual reinforcement learning settings, their performance can degenerate as learning proceeds, indicating a loss of plasticity. To address this, building on Neural Tangent Kernel (NTK) theory, we formalize the plasticity loss in MoE policies as a loss of spectral plasticity. We then derive a tractable proxy for spectral plasticity, one expressible in terms of individual expert feature matrices. Leveraging this proxy, we introduce SPHERE, a practical Parseval penalty tailored for MoE-based policies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
