SpaceMoE: Realizing Distributed Mixture-of-Experts Inference over Space Networks
Zhanwei Wang, Huiling Yang, Min Sheng, Khaled B. Letaief, Kaibin Huang

TL;DR
SpaceMoE introduces a novel framework for efficiently deploying large language models across satellite networks, optimizing placement to reduce latency and adapt to space communication constraints.
Contribution
The paper proposes a two-level placement strategy for MoE models in satellite constellations, including layer and intra-layer expert placement, with an optimization approach for expert mapping.
Findings
Achieves at least a threefold latency reduction over baseline strategies.
Formulates an optimization problem for expert placement considering activation probabilities.
Utilizes the ring-like communication pattern of satellite constellations for layer placement.
Abstract
Leveraging continuous solar energy harvesting at high efficiency, space data centers are envisioned as a promising platform for executing energy-intensive large language models (LLMs). Recognizing this advantage, space and AI conglomerates (e.g., SpaceX, Google) are actively investing in this vision. One key challenge, however, is the efficient distributed deployment of a large-scale LLM in a satellite network due to the limited onboard computing and communication resources. This gives rise to a placement problem that involves partitioning and mapping model components to satellites such that the fundamentally different model architecture and network topology can be reconciled to ensure low-latency token generation. To address this problem, we present the Space Network of Mixture-of-Experts (SpaceMoE) framework targeting the distributed execution of a popular mixture-of-experts (MoE)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
