Karasu: A Collaborative Approach to Efficient Cluster Configuration for Big Data Analytics
Dominik Scheinert, Philipp Wiesner, Thorsten Wittkopp, Lauritz, Thamsen, Jonathan Will, and Odej Kao

TL;DR
Karasu is a collaborative profiling approach that leverages shared workload information to efficiently optimize big data cluster configurations, reducing profiling time, cost, and energy use.
Contribution
Karasu introduces a novel collaborative profiling method that shares aggregated performance data among users to improve cluster configuration efficiency for big data analytics.
Findings
Significantly improves configuration search performance
Reduces profiling time and cost
Effective even with limited shared workload data
Abstract
Selecting the right resources for big data analytics jobs is hard because of the wide variety of configuration options like machine type and cluster size. As poor choices can have a significant impact on resource efficiency, cost, and energy usage, automated approaches are gaining popularity. Most existing methods rely on profiling recurring workloads to find near-optimal solutions over time. Due to the cold-start problem, this often leads to lengthy and costly profiling phases. However, big data analytics jobs across users can share many common properties: they often operate on similar infrastructure, using similar algorithms implemented in similar frameworks. The potential in sharing aggregated profiling runs to collaboratively address the cold start problem is largely unexplored. We present Karasu, an approach to more efficient resource configuration profiling that promotes data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Data Stream Mining Techniques
