Ruya: Memory-Aware Iterative Optimization of Cluster Configurations for Big Data Processing
Jonathan Will, Lauritz Thamsen, Jonathan Bader, Dominik, Scheinert, Odej Kao

TL;DR
Ruya is a memory-aware iterative optimization method that efficiently finds optimal cluster configurations for big data processing by modeling memory usage and using Bayesian optimization to reduce search efforts.
Contribution
It introduces a novel memory profiling and Bayesian optimization approach to significantly reduce the search space and iterations for optimal cluster configuration in big data processing.
Findings
Reduced search iterations by about 50% compared to baseline
Effective memory modeling from small sample runs
Improved cost and performance trade-offs in cluster setup
Abstract
Selecting appropriate computational resources for data processing jobs on large clusters is difficult, even for expert users like data engineers. Inadequate choices can result in vastly increased costs, without significantly improving performance. One crucial aspect of selecting an efficient resource configuration is avoiding memory bottlenecks. By knowing the required memory of a job in advance, the search space for an optimal resource configuration can be greatly reduced. Therefore, we present Ruya, a method for memory-aware optimization of data processing cluster configurations based on iteratively exploring a narrowed-down search space. First, we perform job profiling runs with small samples of the dataset on just a single machine to model the job's memory usage patterns. Second, we prioritize cluster configurations with a suitable amount of total memory and within this reduced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCloud Computing and Resource Management · IoT and Edge/Fog Computing · Age of Information Optimization
