Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

Klaus-Rudolf Kladny; Maximilian Mordig; Bernhard Sch\"olkopf; Michael Muehlebach

arXiv:2605.04952·cs.LG·May 7, 2026

Adaptive Inverted-Index Routing for Granular Mixtures-of-Experts

Klaus-Rudolf Kladny, Maximilian Mordig, Bernhard Sch\"olkopf, Michael Muehlebach

PDF

TL;DR

This paper introduces AIR-MoE, a two-stage routing method for granular mixture-of-experts models that improves efficiency and performance without altering model architecture.

Contribution

AIR-MoE is a novel inverted-index-inspired routing architecture that approximates top-k expert selection efficiently in granular MoE models.

Findings

01

AIR-MoE improves routing efficiency in granular MoE settings.

02

AIR-MoE achieves better performance than existing routing methods.

03

The method requires no changes to model architecture or loss function.

Abstract

Mixture-of-experts (MoE) models enable scalable transformer architectures by activating only a subset of experts per token. Recent evidence suggests that performance improves with increasingly granular experts, i.e., many small experts instead of a few large ones. However, this regime substantially increases routing cost, which can dominate computation. We introduce adaptive inverted-index routing for MoE (AIR-MoE), an inverted-index-inspired routing architecture based on vector quantization (VQ). In a first stage, AIR-MoE performs coarse shortlisting by assigning tokens to VQ codewords to construct a candidate set of experts. In a second stage, fine scoring computes exact routing scores restricted to this shortlist. This two-stage procedure approximates true top-k routing while avoiding full expert scoring and, in contrast to prior work, imposing no structural constraints on expert…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.