Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across   Contexts

Dominik Scheinert; Lauritz Thamsen; Houkun Zhu; Jonathan Will,; Alexander Acker; Thorsten Wittkopp; Odej Kao

arXiv:2107.13921·cs.DC·October 19, 2021

Bellamy: Reusing Performance Models for Distributed Dataflow Jobs Across Contexts

Dominik Scheinert, Lauritz Thamsen, Houkun Zhu, Jonathan Will,, Alexander Acker, Thorsten Wittkopp, Odej Kao

PDF

1 Repo

TL;DR

Bellamy is a novel performance modeling approach for distributed dataflow jobs that effectively reuses historical execution data across different contexts, improving resource selection accuracy.

Contribution

It introduces a two-step modeling process that combines general models with context-specific optimization, enabling better performance predictions across diverse environments.

Findings

01

Outperforms state-of-the-art methods on public datasets

02

Effectively captures job execution context

03

Reduces need for retraining models for new environments

Abstract

Distributed dataflow systems enable the use of clusters for scalable data analytics. However, selecting appropriate cluster resources for a processing job is often not straightforward. Performance models trained on historical executions of a concrete job are helpful in such situations, yet they are usually bound to a specific job execution context (e.g. node type, software versions, job parameters) due to the few considered input parameters. Even in case of slight context changes, such supportive models need to be retrained and cannot benefit from historical execution data from related contexts. This paper presents Bellamy, a novel modeling approach that combines scale-outs, dataset sizes, and runtimes with additional descriptive properties of a dataflow job. It is thereby able to capture the context of a job execution. Moreover, Bellamy is realizing a two-step modeling approach.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dos-group/bellamy-runtime-prediction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.