Co-Tuning of Cloud Infrastructure and Distributed Data Processing   Platforms

Isuru Dharmadasa; Faheem Ullah

arXiv:2309.00269·cs.DC·December 8, 2023

Co-Tuning of Cloud Infrastructure and Distributed Data Processing Platforms

Isuru Dharmadasa, Faheem Ullah

PDF

Open Access

TL;DR

This paper presents a machine learning-based co-tuning approach for cloud infrastructure and distributed data processing platforms like Hadoop, Spark, and Flink, significantly improving performance and reducing costs.

Contribution

It introduces a novel co-tuning method that jointly optimizes cloud and platform configurations, addressing a gap in existing isolated tuning approaches.

Findings

01

Reduces execution time by 17.5%

02

Lowers costs by 14.9%

03

Effective across multiple platforms and workloads

Abstract

Distributed Data Processing Platforms (e.g., Hadoop, Spark, and Flink) are widely used to store and process data in a cloud environment. These platforms distribute the storage and processing of data among the computing nodes of a cloud. The efficient use of these platforms requires users to (i) configure the cloud i.e., determine the number and type of computing nodes, and (ii) tune the configuration parameters (e.g., data replication factor) of the platform. However, both these tasks require in-depth knowledge of the cloud infrastructure and distributed data processing platforms. Therefore, in this paper, we first study the relationship between the configuration of the cloud and the configuration of distributed data processing platforms to determine how cloud configuration impacts platform configuration. After understanding the impacts, we propose a co-tuning approach for recommending…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Cloud Data Security Solutions