Partitioning SKA Dataflows for Optimal Graph Execution
Chen Wu, Andreas Wicenec, Rodrigo Tobar

TL;DR
This paper presents DALiuGE, a graph scheduling system for SKA data processing that optimizes execution time and resource use, demonstrated on radio astronomy pipelines.
Contribution
The paper introduces a novel graph scheduling approach for SKA data workflows, extending previous methods to improve efficiency and resource management.
Findings
Preliminary results show improved workflow performance.
The approach effectively manages large-scale data pipelines.
Optimization methods reduce execution time and resource footprint.
Abstract
Optimizing data-intensive workflow execution is essential to many modern scientific projects such as the Square Kilometre Array (SKA), which will be the largest radio telescope in the world, collecting terabytes of data per second for the next few decades. At the core of the SKA Science Data Processor is the graph execution engine, scheduling tens of thousands of algorithmic components to ingest and transform millions of parallel data chunks in order to solve a series of large-scale inverse problems within the power budget. To tackle this challenge, we have developed the Data Activated Liu Graph Engine (DALiuGE) to manage data processing pipelines for several SKA pathfinder projects. In this paper, we discuss the DALiuGE graph scheduling sub-system. By extending previous studies on graph scheduling and partitioning, we lay the foundation on which we can develop polynomial time…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
