Excitation: Momentum For Experts

Sagi Shaier

arXiv:2602.21798·cs.LG·February 26, 2026

Excitation: Momentum For Experts

Sagi Shaier

PDF

Open Access

TL;DR

Excitation is a novel optimizer framework that dynamically modulates updates based on expert utilization in sparse models, improving training stability and performance in Mixture-of-Experts architectures.

Contribution

It introduces a new dynamic update modulation method that enhances training stability and performance in sparse, expert-based models, addressing structural confusion issues.

Findings

01

Accelerates convergence in language and vision MoE models

02

Rescues models from structural confusion during training

03

Consistently improves final performance across tasks

Abstract

We propose Excitation, a novel optimization framework designed to accelerate learning in sparse architectures such as Mixture-of-Experts (MoEs). Unlike traditional optimizers that treat all parameters uniformly, Excitation dynamically modulates updates using batch-level expert utilization. It introduces a competitive update dynamic that amplifies updates to highly-utilized experts and can selectively suppress low-utilization ones, effectively sharpening routing specialization. Notably, we identify a phenomenon of "structural confusion" in deep MoEs, where standard optimizers fail to establish functional signal paths; Excitation acts as a specialization catalyst, "rescuing" these models and enabling stable training where baselines remain trapped. Excitation is optimizer-, domain-, and model-agnostic, requires minimal integration effort, and introduces neither additional per-parameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Multimodal Machine Learning Applications