Flora: Efficient Cloud Resource Selection for Big Data Processing via   Job Classification

Jonathan Will; Lauritz Thamsen; Jonathan Bader; Odej Kao

arXiv:2502.21046·cs.DC·March 3, 2025

Flora: Efficient Cloud Resource Selection for Big Data Processing via Job Classification

Jonathan Will, Lauritz Thamsen, Jonathan Bader, Odej Kao

PDF

1 Repo

TL;DR

Flora is a low-overhead method that classifies big data jobs by data access patterns to efficiently select cost-effective cloud resources, reducing costs with minimal deviation from optimal configurations.

Contribution

Flora introduces a novel job classification-based approach for optimizing cloud resource selection tailored to data access patterns in big data processing.

Findings

01

Achieves an average deviation below 6% from optimal cost

02

Handles diverse job categories with high accuracy

03

Reduces resource selection overhead significantly

Abstract

Distributed dataflow systems like Spark and Flink enable data-parallel processing of large datasets on clusters of cloud resources. Yet, selecting appropriate computational resources for dataflow jobs is often challenging. For efficient execution, individual resource allocations, such as memory and CPU cores, must meet the specific resource demands of the job. Meanwhile, the choices of cloud configurations are often plentiful, especially in public clouds, and the current cost of the available resource options can fluctuate. Addressing this challenge, we present Flora, a low-overhead approach to cost-optimizing cloud cluster configurations for big data processing. Flora lets users categorize jobs according to their data access patterns and derives suitable cluster resource configurations from executions of test jobs of the same category, considering current resource costs. In our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dos-group/flora
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.