SkyWalker: A Locality-Aware Cross-Region Load Balancer for LLM Inference

Tian Xia; Ziming Mao; Jamison Kerney; Ethan J. Jackson; Zhifei Li; Jiarong Xing; Scott Shenker; Ion Stoica

arXiv:2505.24095·cs.DC·November 10, 2025

SkyWalker: A Locality-Aware Cross-Region Load Balancer for LLM Inference

Tian Xia, Ziming Mao, Jamison Kerney, Ethan J. Jackson, Zhifei Li, Jiarong Xing, Scott Shenker, Ion Stoica

PDF

TL;DR

SkyWalker is a novel multi-region load balancer for LLM inference that improves throughput, reduces latency, and cuts costs by intelligently aggregating regional traffic patterns while maintaining cache locality.

Contribution

It introduces a cache-aware, cross-region traffic handling mechanism that enables cost-effective and efficient multi-region LLM serving with preserved KV-Cache locality.

Findings

01

Achieves 1.12-2.06x higher throughput

02

Reduces latency by 1.74-6.30x

03

Cuts total serving cost by 25%

Abstract

Serving Large Language Models (LLMs) efficiently in multi-region setups remains a challenge. Due to cost and GPU availability concerns, providers typically deploy LLMs in multiple regions using instance with long-term commitments, like reserved instances or on-premise clusters, which are often underutilized due to their region-local traffic handling and diurnal traffic variance. In this paper, we introduce SkyWalker, a multi-region load balancer for LLM inference that aggregates regional diurnal patterns through cross-region traffic handling. By doing so, SkyWalker enables providers to reserve instances based on expected global demand, rather than peak demand in each individual region. Meanwhile, SkyWalker preserves KV-Cache locality and load balancing, ensuring cost efficiency without sacrificing performance. SkyWalker achieves this with a cache-aware cross-region traffic handler and a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.