Cost optimization of data flows based on task re-ordering
Georgia Kougka, Anastasios Gounaris

TL;DR
This paper introduces approximate algorithms for task re-ordering in data flows to minimize execution costs, demonstrating significant improvements over existing methods through validation on real and synthetic data.
Contribution
It proposes novel approximate algorithms for task re-ordering in data flows, addressing the lack of efficient, scalable, cost-based optimization solutions.
Findings
Achieved significant speed-ups in data flow execution
Moved closer to optimal solutions compared to state-of-the-art
Validated effectiveness on real and synthetic data flows
Abstract
Analyzing big data in a highly dynamic environment becomes more and more critical because of the increasingly need for end-to-end processing of this data. Modern data flows are quite complex and there are not efficient, cost-based, fully-automated, scalable optimization solutions that can facilitate flow designers. The state-of-the-art proposals fail to provide near optimal solutions even for simple data flows. To tackle this problem, we introduce a set of approximate algorithms for defining the execution order of the constituent tasks, in order to minimize the total execution cost of a data flow. We also present the advantages of the parallel execution of data flows. We validated our proposals in both a real tool and synthetic flows and the results show that we can achieve significant speed-ups, moving much closer to optimal solutions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
