Experimentally Evaluating the Resource Efficiency of Big Data   Autoscaling

Jonathan Will; Nico Treide; Lauritz Thamsen; Odej Kao

arXiv:2501.14456·cs.DC·January 27, 2025

Experimentally Evaluating the Resource Efficiency of Big Data Autoscaling

Jonathan Will, Nico Treide, Lauritz Thamsen, Odej Kao

PDF

1 Repo

TL;DR

This paper evaluates the resource efficiency of autoscaling in big data systems like Spark, finding no significant gains over static allocations due to inherent limitations in autoscaling approaches.

Contribution

It provides a conceptual and experimental analysis of autoscaling resource efficiency in Spark, highlighting fundamental limitations.

Findings

01

No significant resource efficiency gain over static allocations

02

Inelasticity of node size limits autoscaling benefits

03

Memory to CPU ratio inautoscaling is inherently constrained

Abstract

Distributed dataflow systems like Spark and Flink enable data-parallel processing of large datasets on clusters. Yet, selecting appropriate computational resources for dataflow jobs is often challenging. For efficient execution, individual resource allocations, such as memory and CPU cores, must meet the specific resource requirements of the job. An alternative to selecting a static resource allocation for a job execution is autoscaling as implemented for example by Spark. In this paper, we evaluate the resource efficiency of autoscaling batch data processing jobs based on resource demand both conceptually and experimentally by analyzing a new dataset of Spark job executions on Google Dataproc Serverless. In our experimental evaluation, we show that there is no significant resource efficiency gain over static resource allocations. We found that the inherent conceptual limitations of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dos-group/spark-autoscaling-evaluation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.