WWW.Serve: Interconnecting Global LLM Services through Decentralization
Huanyu Wang, Ziyu Xia, Zhuoming Chen, Beidi Chen

TL;DR
WWW.Serve introduces a decentralized framework for global LLM service interconnection, enabling flexible participation and autonomous request dispatch, significantly improving service levels and latency without centralized control.
Contribution
It proposes a novel decentralized LLM serving framework that supports flexible policies and self-organizing request allocation, overcoming limitations of existing centralized or rigid systems.
Findings
Improves global SLO attainment by up to 1.5x
Reduces latency by 27.6%
Performance approaches or surpasses centralized scheduling
Abstract
Large language model (LLM) services are mostly centralized, leading to scalability bottlenecks and underutilization of substantial scattered GPU resources. While decentralization offers a promising alternative, existing frameworks primarily focus on cooperation among GPU providers while overlooking their inherent competitive dynamics, imposing substantial constraints such as excessive platform-level oversight or rigid requirements to execute all assigned requests using fixed software stacks on fixed hardware configurations. We argue that such assumptions are unrealistic in real-world decentralized environments. To this end, we propose WWWServe, a decentralized framework for interconnecting LLM services worldwide. It allows participants to flexibly determine their participation policies and resource commitments, and supports self-organizing request dispatch, enabling the network to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Scientific Computing and Data Management · Software System Performance and Reliability
