CoMoE: Collaborative Optimization of Expert Aggregation and Offloading for MoE-based LLMs at Edge

Muqing Li; Ning Li; Xin Yuan; Wenchao Xu; Quan Chen; Song Guo; Haijun Zhang

arXiv:2508.09208·cs.NI·August 14, 2025

CoMoE: Collaborative Optimization of Expert Aggregation and Offloading for MoE-based LLMs at Edge

Muqing Li, Ning Li, Xin Yuan, Wenchao Xu, Quan Chen, Song Guo, Haijun Zhang

PDF

TL;DR

CoMoE is a dynamic, resource-aware framework that optimizes expert aggregation and offloading for large language models at mobile edge devices, significantly reducing memory and latency while maintaining performance.

Contribution

It introduces a novel adaptive optimization framework for MoE deployment in mobile edge environments, addressing expert aggregation and offloading challenges dynamically.

Findings

01

70% memory reduction compared to baselines

02

10.5% lower inference latency than existing methods

03

Enables deployment of large-scale MoE models on resource-constrained devices

Abstract

The proliferation of large language models (LLMs) has driven the adoption of Mixture-of-Experts (MoE) architectures as a promising solution to scale model capacity while controlling computational costs. However, deploying MoE models in resource-constrained mobile edge computing environments presents significant challenges due to their large memory footprint and dynamic expert activation patterns. To address these challenges, we propose a novel dynamic resource-aware collaborative optimization framework that jointly optimizes expert aggregation granularity and offloading strategies based on real-time device resource states, network conditions, and input characteristics in mobile edge environments, denoted as CoMoE. In CoMoE, we first systematically analyze existing expert aggregation techniques, including expert parameter merging,knowledge distillation,and parameter sharing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.