Expert Routing for Communication-Efficient MoE via Finite Expert Banks

Mohammad Reza Deylam Salehi; Ali Khalesi

arXiv:2605.05278·cs.LG·May 8, 2026

Expert Routing for Communication-Efficient MoE via Finite Expert Banks

Mohammad Reza Deylam Salehi, Ali Khalesi

PDF

TL;DR

This paper introduces a finite-bank approach to analyze and optimize resource-efficient sparse Mixture-of-Experts models by quantifying routing information and generalization using information-theoretic measures.

Contribution

It develops a practical framework using finite expert banks and information theory to analyze and improve expert routing efficiency in sparse MoE architectures.

Findings

01

Mutual information I(S;W) tracks the generalization gap.

02

The Xu-Raginsky bound is looser than empirical estimates.

03

The framework enables analysis of resource-aware MoE inference systems.

Abstract

Resource-efficient machine learning increasingly uses sparse Mixture-of-Experts (MoE) architectures, where the gate acts as both a learning component and a routing interface controlling computation, communication, and accuracy. Motivated by finite-rate interpretations of MoE gating, we treat the gate as a stochastic channel and use $I (X; T)$ to quantify the routing information available to the selected expert. To make the associated information quantities tractable beyond synthetic examples, we develop a finite-bank MNIST construction using pretrained CNN experts and a discrete, data-dependent selection rule. Since the selected model belongs to a finite candidate set, the algorithmic mutual information $I (S; W)$ admits a closed-form discrete-entropy estimator from the empirical posterior $q (W ∣ S)$ . Sweeping a data-dependence parameter $α$ , we observe that $I (S; W)$ …

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.