CASH: A Credit Aware Scheduling for Public Cloud Platforms
Aakash Sharma, Saravanan Dhakshinamurthy, George Kesidis, Chita R. Das

TL;DR
This paper introduces a credit-aware scheduling approach for public cloud platforms that optimizes task placement and resource utilization, leading to significant cost savings and performance improvements in big data processing.
Contribution
The paper presents modifications to YARN, Hadoop, and Tez to incorporate hardware quality-of-service awareness, enabling more efficient and cost-effective cloud resource management.
Findings
CPU credit instances like Amazon T3 are cost-effective for big data workloads.
Optimized scheduling accelerates streaming SQL queries by up to 31%.
Cost savings of up to 22% are achieved in cloud-based big data processing.
Abstract
The public cloud offers a myriad of services which allows its tenants to process large scale big data in a flexible, easy and cost effective manner. Tenants generally use large scale data processing frameworks such as MapReduce, Tez, Spark etc. to process their data. Tenants can configure their frameworks to run individual tasks by the framework itself or have a middleware cluster manager like YARN or Mesos to arbitrate resource scheduling in their public-cloud cluster. Cluster managers need to be cognizant about the workload requirement along with the state of the individual resource such as CPU and disk in the cluster. Cloud providers use a token bucket mechanism for their individual hardware resources as an indicator of the quality-of-service that individual hardware resource can provide. In this paper, through our changes in YARN, Hadoop and Tez, we show how middleware cluster…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
