SkyServe: Serving AI Models across Regions and Clouds with Spot Instances
Ziming Mao, Tian Xia, Zhanghao Wu, Wei-Lin Chiang, Tyler Griggs, Romil, Bhardwaj, Zongheng Yang, Scott Shenker, Ion Stoica

TL;DR
SkyServe is a system that efficiently uses spot and on-demand cloud resources across regions and clouds to reduce costs and improve AI model serving reliability and latency.
Contribution
It introduces SpotHedge, a novel policy for spreading spot replicas across failure domains to enhance availability and cost-efficiency in AI model serving.
Findings
Reduces AI serving costs by 43% on average.
Improves latency at P50, P90, and P99 by over 2 times.
Achieves high resource availability with mixed spot and on-demand replicas.
Abstract
Recent years have witnessed an explosive growth of AI models. The high cost of hosting AI services on GPUs and their demanding service requirements, make it timely and challenging to lower service costs and guarantee service quality. While spot instances have long been offered with a large discount, spot preemptions have discouraged users from using them to host model replicas when serving AI models. To address this, we propose a simple yet efficient policy, SpotHedge, that leverages spot replicas across different failure domains (e.g., regions and clouds) to ensure availability, lower costs, and high service quality. SpotHedge intelligently spreads spot replicas across different regions and clouds to improve availability and reduce correlated preemptions, overprovisions cheap spot replicas than required as a safeguard against possible preemptions, and dynamically falls back to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Traffic Prediction and Management Techniques · Cloud Computing and Resource Management
Methodstravel james
