WANSpec: Leveraging Global Compute Capacity for LLM Inference

Noah Martin; Fahad Dogar

arXiv:2602.18931·cs.DC·February 24, 2026

WANSpec: Leveraging Global Compute Capacity for LLM Inference

Noah Martin, Fahad Dogar

PDF

Open Access

TL;DR

WANSpec leverages under-utilized global data centers and speculative decoding to optimize LLM inference, reducing latency and computational load by intelligently offloading parts of the workload across geographically distributed resources.

Contribution

This work introduces WANSpec, a novel approach that shifts parts of LLM inference to under-utilized data centers using speculative decoding, improving efficiency and latency.

Findings

01

Reduces forward passes of speculative decoding by over 50%

02

Mitigates capacity issues in high-demand data centers

03

Effectively utilizes global compute resources for LLM inference

Abstract

Data centers capable of running large language models (LLMs) are spread across the globe. Some have high end GPUs for running the most advanced models (100B+ parameters), and others are only suitable for smaller models (1B parameters). The most capable GPUs are under high demand thanks to the rapidly expanding applications of LLMs. Choosing the right location to run an LLM inference workload can have consequences on the latency of requests due to these high demands. In this work, we explore options to shift some aspects of inference to the under-utilized data centers. We first observe the varying delays affecting inference in AWS services from different regions, demonstrating that load is not spread evenly. We then introduce WANSpec, which offloads part of LLM generation to the under-utilized data centers. In doing so, WANSpec can mitigate capacity issues as well as effectively use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Digital Economy · Scientific Computing and Data Management · Software System Performance and Reliability