EC2MoE: Adaptive End-Cloud Pipeline Collaboration Enabling Scalable Mixture-of-Experts Inference

Zheming Yang; Yunqing Hu; Sheng Sun; and Wen Ji

arXiv:2508.06024·cs.DC·August 11, 2025

EC2MoE: Adaptive End-Cloud Pipeline Collaboration Enabling Scalable Mixture-of-Experts Inference

Zheming Yang, Yunqing Hu, Sheng Sun, and Wen Ji

PDF

Open Access

TL;DR

EC2MoE introduces an adaptive end-cloud pipeline collaboration framework that significantly improves the scalability and efficiency of Mixture-of-Experts inference across heterogeneous environments.

Contribution

The paper presents a hardware-aware expert selection mechanism and a pipeline optimization strategy for scalable MoE inference in end-cloud settings.

Findings

01

Increases throughput by 2.2x to 5.1x.

02

Reduces end-to-end latency by 53% to 67%.

03

Maintains high accuracy and scalability under dynamic conditions.

Abstract

The Mixture-of-Experts (MoE) paradigm has emerged as a promising solution to scale up model capacity while maintaining inference efficiency. However, deploying MoE models across heterogeneous end-cloud environments poses new challenges in expert scheduling, communication overhead, and resource heterogeneity. In this paper, we propose EC2MoE, an adaptive framework for scalable MoE inference via end-cloud pipeline collaboration. First, we design a hardware-aware lightweight group gate network that enhances expert selection and computational efficiency. By incorporating a hardware-aware local expert selection mechanism, the system adaptively filters candidate experts based on real-time device profiles. A lightweight group gate module then integrates local and global gating outputs to achieve high-quality expert routing with minimal overhead. Second, we develop a pipeline optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Data Stream Mining Techniques · Big Data and Business Intelligence