SiftMoE: Similarity-Aware Energy-Efficient Expert Selection for Wireless Distributed MoE Inference

Qian Chen; Xianhao Chen; Kaibin Huang

arXiv:2603.23888·cs.IT·March 26, 2026

SiftMoE: Similarity-Aware Energy-Efficient Expert Selection for Wireless Distributed MoE Inference

Qian Chen, Xianhao Chen, Kaibin Huang

PDF

Open Access

TL;DR

SiftMoE introduces a similarity-aware expert selection method for wireless distributed MoE inference, reducing energy consumption and communication costs while maintaining accuracy in resource-constrained edge networks.

Contribution

The paper proposes a novel expert selection framework based on expert similarity, with theoretical bounds and optimal policies for different channel conditions in wireless distributed MoE.

Findings

01

Significant energy savings compared to traditional methods.

02

Maintains inference accuracy despite expert skipping.

03

Effective in both slow-fading and fast-fading wireless channels.

Abstract

Mixture-of-Experts (MoE) architectures leverage sparse activation to enhance the scalability of large language models (LLMs), making them suitable for deployment in resource-constrained edge networks. However, the sheer number of experts often exceeds the memory capacity of individual edge nodes, necessitating wireless distributed MoE (WIDE) inference where experts are spread across multiple edge nodes. In this context, expert selection directly affects communication costs. Motivated by the similarity of experts, we propose SiftMoE, which judiciously selects or skips experts to strike a tradeoff between communication costs and inference accuracy. Specifically, we first establish theoretical bounds on the accuracy degradation resulting from expert replacement or skipping. Based on the bounds, we formulate an energy minimization problem for expert selection in WIDE inference subject to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · IoT Networks and Protocols · Stochastic Gradient Optimization Techniques