Partitioning SKA Dataflows for Optimal Graph Execution

Chen Wu; Andreas Wicenec; Rodrigo Tobar

arXiv:1805.07568·cs.DC·May 22, 2018

Partitioning SKA Dataflows for Optimal Graph Execution

Chen Wu, Andreas Wicenec, Rodrigo Tobar

PDF

TL;DR

This paper presents DALiuGE, a graph scheduling system for SKA data processing that optimizes execution time and resource use, demonstrated on radio astronomy pipelines.

Contribution

The paper introduces a novel graph scheduling approach for SKA data workflows, extending previous methods to improve efficiency and resource management.

Findings

01

Preliminary results show improved workflow performance.

02

The approach effectively manages large-scale data pipelines.

03

Optimization methods reduce execution time and resource footprint.

Abstract

Optimizing data-intensive workflow execution is essential to many modern scientific projects such as the Square Kilometre Array (SKA), which will be the largest radio telescope in the world, collecting terabytes of data per second for the next few decades. At the core of the SKA Science Data Processor is the graph execution engine, scheduling tens of thousands of algorithmic components to ingest and transform millions of parallel data chunks in order to solve a series of large-scale inverse problems within the power budget. To tackle this challenge, we have developed the Data Activated Liu Graph Engine (DALiuGE) to manage data processing pipelines for several SKA pathfinder projects. In this paper, we discuss the DALiuGE graph scheduling sub-system. By extending previous studies on graph scheduling and partitioning, we lay the foundation on which we can develop polynomial time…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.