OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale

Jingze Shi; Zhangyang Peng; Yizhang Zhu; Yifan Wu; Guang Liu; Yuyu Luo

arXiv:2602.05711·cs.CL·February 6, 2026

OmniMoE: An Efficient MoE by Orchestrating Atomic Experts at Scale

Jingze Shi, Zhangyang Peng, Yizhang Zhu, Yifan Wu, Guang Liu, Yuyu Luo

PDF

Open Access

TL;DR

OmniMoE introduces a novel system-algorithm co-designed framework that enables extremely fine-grained expert routing in MoE architectures, significantly improving efficiency and accuracy at scale.

Contribution

It proposes vector-level Atomic Experts with a Cartesian Product Router and Expert-Centric Scheduling, enabling scalable, efficient MoE with maximal expert granularity.

Findings

01

Achieves 50.9% zero-shot accuracy on seven benchmarks.

02

Reduces inference latency from 73ms to 6.7ms.

03

Outperforms existing coarse- and fine-grained MoE baselines.

Abstract

Mixture-of-Experts (MoE) architectures are evolving towards finer granularity to improve parameter efficiency. However, existing MoE designs face an inherent trade-off between the granularity of expert specialization and hardware execution efficiency. We propose OmniMoE, a system-algorithm co-designed framework that pushes expert granularity to its logical extreme. OmniMoE introduces vector-level Atomic Experts, enabling scalable routing and execution within a single MoE layer, while retaining a shared dense MLP branch for general-purpose processing. Although this atomic design maximizes capacity, it poses severe challenges for routing complexity and memory access. To address these, OmniMoE adopts a system-algorithm co-design: (i) a Cartesian Product Router that decomposes the massive index space to reduce routing complexity from O(N) to O(sqrt(N)); and (ii) Expert-Centric Scheduling…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Mobile Crowdsensing and Crowdsourcing · IoT and Edge/Fog Computing