SiftMoE: Similarity-Aware Energy-Efficient Expert Selection for Wireless Distributed MoE Inference
Qian Chen, Xianhao Chen, Kaibin Huang

TL;DR
SiftMoE introduces a similarity-aware expert selection method for wireless distributed MoE inference, reducing energy consumption and communication costs while maintaining accuracy in resource-constrained edge networks.
Contribution
The paper proposes a novel expert selection framework based on expert similarity, with theoretical bounds and optimal policies for different channel conditions in wireless distributed MoE.
Findings
Significant energy savings compared to traditional methods.
Maintains inference accuracy despite expert skipping.
Effective in both slow-fading and fast-fading wireless channels.
Abstract
Mixture-of-Experts (MoE) architectures leverage sparse activation to enhance the scalability of large language models (LLMs), making them suitable for deployment in resource-constrained edge networks. However, the sheer number of experts often exceeds the memory capacity of individual edge nodes, necessitating wireless distributed MoE (WIDE) inference where experts are spread across multiple edge nodes. In this context, expert selection directly affects communication costs. Motivated by the similarity of experts, we propose SiftMoE, which judiciously selects or skips experts to strike a tradeoff between communication costs and inference accuracy. Specifically, we first establish theoretical bounds on the accuracy degradation resulting from expert replacement or skipping. Based on the bounds, we formulate an energy minimization problem for expert selection in WIDE inference subject to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · IoT Networks and Protocols · Stochastic Gradient Optimization Techniques
