Exploiting Spot Instances for Time-Critical Cloud Workloads Using Optimal Randomized Strategies
Neelkamal Bhuyan, Randeep Bhatia, Murali Kodialam, TV Lakshman

TL;DR
This paper introduces ROSS, a randomized scheduling algorithm that optimally balances cost and deadlines for cloud jobs using spot and on-demand instances, outperforming existing methods.
Contribution
The paper proves a fundamental limit for deterministic policies and proposes ROSS, a randomized algorithm with optimal competitive ratio for deadline-aware cloud scheduling.
Findings
ROSS achieves a competitive ratio of √K, improving over deterministic policies.
Extensive evaluations show ROSS saves up to 30% costs compared to state-of-the-art.
ROSS effectively balances cost and deadline guarantees in real-world cloud environments.
Abstract
This paper addresses the challenge of deadline-aware online scheduling for jobs in hybrid cloud environments, where jobs may run on either cost-effective but unreliable spot instances or more expensive on-demand instances, under hard deadlines. We first establish a fundamental limit for existing (predominantly-) deterministic policies, proving a worst-case competitive ratio of , where is the cost ratio between on-demand and spot instances. We then present a novel randomized scheduling algorithm, ROSS, that achieves a provably optimal competitive ratio of under reasonable deadlines, significantly improving upon existing approaches. Extensive evaluations on real-world trace data from Azure and AWS demonstrate that ROSS effectively balances cost optimization and deadline guarantees, consistently outperforming the state-of-the-art by up to in cost savings,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · Distributed and Parallel Computing Systems · Optimization and Search Problems
