C3O: Collaborative Cluster Configuration Optimization for Distributed   Data Processing in Public Clouds

Jonathan Will; Lauritz Thamsen; Dominik Scheinert; Jonathan; Bader; Odej Kao

arXiv:2107.13317·cs.DC·December 3, 2021

C3O: Collaborative Cluster Configuration Optimization for Distributed Data Processing in Public Clouds

Jonathan Will, Lauritz Thamsen, Dominik Scheinert, Jonathan, Bader, Odej Kao

PDF

1 Repo

TL;DR

C3O is a system that leverages shared historical runtime data to optimize cloud cluster configurations for distributed data processing, improving resource utilization and reducing bottlenecks.

Contribution

It introduces a collaborative approach using regression models trained on shared data to predict job runtimes across various cluster configurations.

Findings

01

Mean absolute error below 3% in runtime predictions

02

Effective resource utilization in cloud clusters

03

Applicable to diverse Spark jobs

Abstract

Distributed dataflow systems enable data-parallel processing of large datasets on clusters. Public cloud providers offer a large variety and quantity of resources that can be used for such clusters. Yet, selecting appropriate cloud resources for dataflow jobs - that neither lead to bottlenecks nor to low resource utilization - is often challenging, even for expert users such as data engineers. We present C3O, a collaborative system for optimizing data processing cluster configurations in public clouds based on shared historical runtime data. The shared data is utilized for predicting the runtimes of data processing jobs on different possible cluster configurations, using specialized regression models. These models take the diverse execution contexts of different users into account and exhibit mean absolute errors below 3% in our experimental evaluation with 930 unique Spark jobs.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dos-group/c3o
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.