Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview
Lauritz Thamsen, Dominik Scheinert, Jonathan Will, Jonathan Bader,, Odej Kao

TL;DR
This paper discusses a collaborative approach to configuring distributed data-parallel processing clusters, leveraging shared runtime data and performance models to improve resource utilization and reduce costs.
Contribution
It introduces methods for sharing runtime data across different jobs and infrastructures to build accurate, reusable performance models for cluster configuration optimization.
Findings
Shared runtime data improves performance model accuracy.
Similarity-based data aggregation enhances configuration predictions.
Dynamic reconfiguration reduces resource wastage.
Abstract
Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires significant insights into expected job runtimes and scaling behavior, resource characteristics, input data distributions, and other factors. Unable to estimate performance accurately, users frequently overprovision resources for their jobs, leading to low resource utilization and high costs. In this paper, we present major building blocks towards a collaborative approach for optimization of data processing cluster configurations based on runtime data and performance models. We believe that runtime data can be shared and used for performance models across different execution contexts, significantly reducing the reliance on the recurrence of individual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
