DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
Chen Wu, Rodrigo Tobar, Kevin Vinsen, Andreas Wicenec, Dave Pallot,, Baoqiang Lao, Ruonan Wang, Tao An, Mark Boulton, Ian Cooper, Richard Dodson,, Markus Dolensky, Ying Mei, Feng Wang

TL;DR
DALiuGE is a scalable, data-activated graph execution framework designed for large-scale astronomical data processing, enabling flexible and efficient pipeline execution across distributed resources, including supercomputers.
Contribution
It introduces a novel data-activated execution model and a flexible interface for expressing complex astronomical data reduction pipelines, supporting scalability from laptops to supercomputers.
Findings
Supports pipeline sizes from less than ten tasks to tens of millions.
Used in production for radio interferometry data reduction.
Demonstrates scalability and flexibility in large-scale astronomical data processing.
Abstract
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for processing large astronomical datasets at a scale required by the Square Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex data reduction pipelines consisting of both data sets and algorithmic components and an implementation run-time to execute such pipelines on distributed resources. By mapping the logical view of a pipeline to its physical realisation, DALiuGE separates the concerns of multiple stakeholders, allowing them to collectively optimise large-scale data processing solutions in a coherent manner. The execution in DALiuGE is data-activated, where each individual data item autonomously triggers the processing on itself. Such decentralisation also makes the execution framework very scalable and flexible, supporting pipeline sizes ranging from less than ten tasks running…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Advanced Data Storage Technologies · Distributed and Parallel Computing Systems
