Cost optimization of data flows based on task re-ordering

Georgia Kougka; Anastasios Gounaris

arXiv:1507.08492·cs.DB·July 31, 2015

Cost optimization of data flows based on task re-ordering

Georgia Kougka, Anastasios Gounaris

PDF

TL;DR

This paper introduces approximate algorithms for task re-ordering in data flows to minimize execution costs, demonstrating significant improvements over existing methods through validation on real and synthetic data.

Contribution

It proposes novel approximate algorithms for task re-ordering in data flows, addressing the lack of efficient, scalable, cost-based optimization solutions.

Findings

01

Achieved significant speed-ups in data flow execution

02

Moved closer to optimal solutions compared to state-of-the-art

03

Validated effectiveness on real and synthetic data flows

Abstract

Analyzing big data in a highly dynamic environment becomes more and more critical because of the increasingly need for end-to-end processing of this data. Modern data flows are quite complex and there are not efficient, cost-based, fully-automated, scalable optimization solutions that can facilitate flow designers. The state-of-the-art proposals fail to provide near optimal solutions even for simple data flows. To tackle this problem, we introduce a set of approximate algorithms for defining the execution order of the constituent tasks, in order to minimize the total execution cost of a data flow. We also present the advantages of the parallel execution of data flows. We validated our proposals in both a real tool and synthetic flows and the results show that we can achieve significant speed-ups, moving much closer to optimal solutions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.