Smartpick: Workload Prediction for Serverless-enabled Scalable Data   Analytics Systems

Anshuman Das Mohapatra; Kwangsung Oh

arXiv:2307.13677·cs.DC·July 26, 2023

Smartpick: Workload Prediction for Serverless-enabled Scalable Data Analytics Systems

Anshuman Das Mohapatra, Kwangsung Oh

PDF

TL;DR

Smartpick is a system that intelligently combines serverless and virtual machine resources for scalable data analytics, using machine learning to optimize configurations for cost and performance in cloud environments.

Contribution

It introduces a novel ML-based approach to dynamically optimize serverless and VM configurations, enabling better cost-performance tradeoffs in data analytics systems.

Findings

01

Achieved up to 50% cost reduction compared to baselines.

02

Predicted configurations with over 97% accuracy on AWS and Google Cloud.

03

Effectively handles workload dynamics through event-driven retraining.

Abstract

Many data analytic systems have adopted a newly emerging compute resource, serverless (SL), to handle data analytics queries in a timely and cost-efficient manner, i.e., serverless data analytics. While these systems can start processing queries quickly thanks to the agility and scalability of SL, they may encounter performance- and cost-bottlenecks based on workloads due to SL's worse performance and more expensive cost than traditional compute resources, e.g., virtual machine (VM). In this project, we introduce Smartpick, a SL-enabled scalable data analytics system that exploits SL and VM together to realize composite benefits, i.e., agility from SL and better performance with reduced cost from VM. Smartpick uses a machine learning prediction scheme, decision-tree based Random Forest with Bayesian Optimizer, to determine SL and VM configurations, i.e., how many SL and VM instances for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.