Neural-based Modeling for Performance Tuning of Spark Data Analytics
Khaled Zaouk, Fei Song, Chenghao Lyu, Yanlei Diao

TL;DR
This paper introduces a deep learning-based approach for performance modeling of Spark data analytics workloads, enabling more accurate predictions for system tuning in cloud environments.
Contribution
It proposes workload embedding techniques and evaluates various deep learning models, demonstrating superior performance over existing tools for cloud analytics.
Findings
Deep learning models outperform traditional performance models.
Workload embeddings effectively capture computational characteristics.
The best model shows significant accuracy improvements.
Abstract
Cloud data analytics has become an integral part of enterprise business operations for data-driven insight discovery. Performance modeling of cloud data analytics is crucial for performance tuning and other critical operations in the cloud. Traditional modeling techniques fail to adapt to the high degree of diversity in workloads and system behaviors in this domain. In this paper, we bring recent Deep Learning techniques to bear on the process of automated performance modeling of cloud data analytics, with a focus on Spark data analytics as representative workloads. At the core of our work is the notion of learning workload embeddings (with a set of desired properties) to represent fundamental computational characteristics of different jobs, which enable performance prediction when used together with job configurations that control resource allocation and other system knobs. Our work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Traffic Prediction and Management Techniques · Neural Networks and Applications
