OptEx: A Deadline-Aware Cost Optimization Model for Spark
Subhajit Sidhanta, Wojciech Golab, and Supratik Mukhopadhyay

TL;DR
OptEx is a novel analytical model that accurately estimates Spark job completion time and cost optimization on cloud clusters, aiding deadline-aware resource provisioning.
Contribution
It introduces the first closed-form analytical model for Spark job execution time and cost optimization under deadlines.
Findings
Achieves 6% mean relative error in time estimation.
Correctly estimates cost-optimal cluster configurations with 98% accuracy.
Enables deadline-aware resource provisioning for Spark jobs.
Abstract
We present OptEx, a closed-form model of job execution on Apache Spark, a popular parallel processing engine. To the best of our knowledge, OptEx is the first work that analytically models job completion time on Spark. The model can be used to estimate the completion time of a given Spark job on a cloud, with respect to the size of the input dataset, the number of iterations, the number of nodes comprising the underlying cluster. Experimental results demonstrate that OptEx yields a mean relative error of 6% in estimating the job completion time. Furthermore, the model can be applied for estimating the cost optimal cluster composition for running a given Spark job on a cloud under a completion deadline specified in the SLO (i.e., Service Level Objective). We show experimentally that OptEx is able to correctly estimate the cost optimal cluster composition for running a given Spark job…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
