Cost models for geo-distributed massively parallel streaming analytics
Anna-Valentini Michailidou, Anastasios Gounaris, Konstantinos, Tsichlas

TL;DR
This paper presents a comprehensive cost model for geo-distributed, massively parallel streaming analytics that considers heterogeneity, geo-location, and complex dataflow structures to optimize task placement and configuration.
Contribution
It introduces a novel data quality-aware cost model integrating multiple aspects of modern dataflows for optimization purposes.
Findings
The cost model effectively captures heterogeneity and geo-distribution effects.
It enables cost-based optimization for task placement.
Supports complex DAGs and streaming applications.
Abstract
This report is part of the DataflowOpt project on optimization of modern dataflows and aims to introduce a data quality-aware cost model that covers the following aspects in combination: (1) heterogeneity in compute nodes, (2) geo-distribution, (3) massive parallelism, (4) complex DAGs and (5) streaming applications. Such a cost model can be then leveraged to devise cost-based optimization solutions that deal with task placement and operator configuration.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Cloud Computing and Resource Management
