Design and Execution of make-like, distributed Analyses based on Spotify's Pipelining Package Luigi
Marcel Rieger, Martin Erdmann, Benjamin Fischer, Robert Fischer

TL;DR
This paper presents a scalable, make-like distributed analysis system based on Spotify's Luigi package, tailored for high-energy physics workflows, enabling automated, parallelized, and distributed data analysis.
Contribution
It introduces a generic analysis design pattern and extends Luigi with features for high-energy physics, including WLCG support, automated resubmission, and a Python interface for complex workflows.
Findings
Supports distributed execution of complex physics analyses
Integrates WLCG infrastructure for job submission and file access
Enables automated, parallelized data analysis workflows
Abstract
In high-energy particle physics, workflow management systems are primarily used as tailored solutions in dedicated areas such as Monte Carlo production. However, physicists performing data analyses are usually required to steer their individual workflows manually which is time-consuming and often leads to undocumented relations between particular workloads. We present a generic analysis design pattern that copes with the sophisticated demands of end-to-end HEP analyses and provides a make-like execution system. It is based on the open-source pipelining package Luigi which was developed at Spotify and enables the definition of arbitrary workloads, so-called Tasks, and the dependencies between them in a lightweight and scalable structure. Further features are multi-user support, automated dependency resolution and error handling, central scheduling, and status visualization in the web. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Data Storage Technologies
