Smartpick: Workload Prediction for Serverless-enabled Scalable Data Analytics Systems
Anshuman Das Mohapatra, Kwangsung Oh

TL;DR
Smartpick is a system that intelligently combines serverless and virtual machine resources for scalable data analytics, using machine learning to optimize configurations for cost and performance in cloud environments.
Contribution
It introduces a novel ML-based approach to dynamically optimize serverless and VM configurations, enabling better cost-performance tradeoffs in data analytics systems.
Findings
Achieved up to 50% cost reduction compared to baselines.
Predicted configurations with over 97% accuracy on AWS and Google Cloud.
Effectively handles workload dynamics through event-driven retraining.
Abstract
Many data analytic systems have adopted a newly emerging compute resource, serverless (SL), to handle data analytics queries in a timely and cost-efficient manner, i.e., serverless data analytics. While these systems can start processing queries quickly thanks to the agility and scalability of SL, they may encounter performance- and cost-bottlenecks based on workloads due to SL's worse performance and more expensive cost than traditional compute resources, e.g., virtual machine (VM). In this project, we introduce Smartpick, a SL-enabled scalable data analytics system that exploits SL and VM together to realize composite benefits, i.e., agility from SL and better performance with reduced cost from VM. Smartpick uses a machine learning prediction scheme, decision-tree based Random Forest with Bayesian Optimizer, to determine SL and VM configurations, i.e., how many SL and VM instances for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
