Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
Shengyuan Ye, Jiangsu Du, Liekang Zeng, Wenzhong Ou, Xiaowen Chu,, Yutong Lu, Xu Chen

TL;DR
Galaxy is a collaborative edge AI system that efficiently accelerates Transformer inference by leveraging idle resources across heterogeneous edge devices, significantly reducing latency and addressing privacy concerns.
Contribution
The paper introduces Galaxy, a novel resource-sharing framework with hybrid model parallelism and communication-computation overlap for efficient in-situ Transformer inference at the edge.
Findings
Achieves up to 2.5x latency reduction compared to state-of-the-art methods.
Effectively utilizes idle resources across heterogeneous edge devices.
Demonstrates significant performance improvements in various edge environments.
Abstract
Transformer-based models have unlocked a plethora of powerful intelligent applications at the edge, such as voice assistant in smart home. Traditional deployment approaches offload the inference workloads to the remote cloud server, which would induce substantial pressure on the backbone network as well as raise users' privacy concerns. To address that, in-situ inference has been recently recognized for edge intelligence, but it still confronts significant challenges stemming from the conflict between intensive workloads and limited on-device computing resources. In this paper, we leverage our observation that many edge environments usually comprise a rich set of accompanying trusted edge devices with idle resources and propose Galaxy, a collaborative edge AI system that breaks the resource walls across heterogeneous edge devices for efficient Transformer inference acceleration. Galaxy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPower Transformer Diagnostics and Insulation · Energy Load and Power Forecasting · Neural Networks and Applications
MethodsAttention Is All You Need · Sparse Evolutionary Training · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout
