Co-Tuning of Cloud Infrastructure and Distributed Data Processing Platforms
Isuru Dharmadasa, Faheem Ullah

TL;DR
This paper presents a machine learning-based co-tuning approach for cloud infrastructure and distributed data processing platforms like Hadoop, Spark, and Flink, significantly improving performance and reducing costs.
Contribution
It introduces a novel co-tuning method that jointly optimizes cloud and platform configurations, addressing a gap in existing isolated tuning approaches.
Findings
Reduces execution time by 17.5%
Lowers costs by 14.9%
Effective across multiple platforms and workloads
Abstract
Distributed Data Processing Platforms (e.g., Hadoop, Spark, and Flink) are widely used to store and process data in a cloud environment. These platforms distribute the storage and processing of data among the computing nodes of a cloud. The efficient use of these platforms requires users to (i) configure the cloud i.e., determine the number and type of computing nodes, and (ii) tune the configuration parameters (e.g., data replication factor) of the platform. However, both these tasks require in-depth knowledge of the cloud infrastructure and distributed data processing platforms. Therefore, in this paper, we first study the relationship between the configuration of the cloud and the configuration of distributed data processing platforms to determine how cloud configuration impacts platform configuration. After understanding the impacts, we propose a co-tuning approach for recommending…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Cloud Data Security Solutions
