Geometric Regularization in Mixture-of-Experts: The Disconnect Between Weights and Activations
Hyunjun Kim

TL;DR
This paper critically examines the effectiveness of geometric regularization in Mixture-of-Experts models, revealing it fails to promote expert diversity in weights or activations and does not consistently enhance model performance.
Contribution
The study provides a comprehensive analysis showing that orthogonality-based regularization does not improve expert specialization or model performance in MoE architectures.
Findings
Orthogonality loss increases weight-space overlap.
Activation-space overlap remains high despite regularization.
Regularization effects on performance are inconsistent.
Abstract
Mixture-of-Experts (MoE) models achieve efficiency through sparse activation, but the role of geometric regularization in expert specialization remains unclear. We apply orthogonality loss to enforce expert diversity and find it fails on multiple fronts: it does not reduce weight-space overlap (MSO actually increases by up to 114%), activation-space overlap remains high (~0.6) regardless of regularization, and effects on performance are inconsistent -- marginal improvement on WikiText-103 (-0.9%), slight degradation on TinyStories (+0.9%), and highly variable results on PTB (std > 1.0). Our analysis across 7 regularization strengths reveals no significant correlation (r = -0.293, p = 0.523) between weight and activation orthogonality. These findings demonstrate that weight-space regularization neither achieves its geometric goal nor reliably improves performance, making it unsuitable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Mobile Crowdsensing and Crowdsourcing · Stochastic Gradient Optimization Techniques
