Collaborative Cluster Configuration for Distributed Data-Parallel   Processing: A Research Overview

Lauritz Thamsen; Dominik Scheinert; Jonathan Will; Jonathan Bader,; Odej Kao

arXiv:2206.00429·cs.DC·June 2, 2022

Collaborative Cluster Configuration for Distributed Data-Parallel Processing: A Research Overview

Lauritz Thamsen, Dominik Scheinert, Jonathan Will, Jonathan Bader,, Odej Kao

PDF

TL;DR

This paper discusses a collaborative approach to configuring distributed data-parallel processing clusters, leveraging shared runtime data and performance models to improve resource utilization and reduce costs.

Contribution

It introduces methods for sharing runtime data across different jobs and infrastructures to build accurate, reusable performance models for cluster configuration optimization.

Findings

01

Shared runtime data improves performance model accuracy.

02

Similarity-based data aggregation enhances configuration predictions.

03

Dynamic reconfiguration reduces resource wastage.

Abstract

Many organizations routinely analyze large datasets using systems for distributed data-parallel processing and clusters of commodity resources. Yet, users need to configure adequate resources for their data processing jobs. This requires significant insights into expected job runtimes and scaling behavior, resource characteristics, input data distributions, and other factors. Unable to estimate performance accurately, users frequently overprovision resources for their jobs, leading to low resource utilization and high costs. In this paper, we present major building blocks towards a collaborative approach for optimization of data processing cluster configurations based on runtime data and performance models. We believe that runtime data can be shared and used for performance models across different execution contexts, significantly reducing the reliance on the recurrence of individual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.