Africanus IV. The Stimela2 framework: scalable and reproducible workflows, from local to cloud compute
Oleg M. Smirnov, Sphesihle Makhathini, Jonathan S. Kenyon, Hertzog L., Bester, Simon J. Perkins, Athanaseus J.T. Ramaila, Benjamin V. Hugo

TL;DR
Stimela2 is a flexible, scalable framework for creating reproducible data reduction workflows in radio astronomy, adaptable to local and cloud computing environments using YAML recipes and containerization.
Contribution
It introduces a modular, YAML-based workflow system that seamlessly integrates containerization and cloud deployment for scalable data processing.
Findings
Supports containerized execution for reproducibility
Enables deployment on cloud and HPC systems
Provides a user-friendly, modular workflow design
Abstract
Stimela2 is a new-generation framework for developing data reduction workflows. It is designed for radio astronomy data but can be adapted for other data processing applications. Stimela2 aims at the middle ground between ease of development, human readability, and enabling robust, scalable and reproducible workflows. It represents workflows by linear, concise and intuitive YAML-format "recipes". Atomic data reduction tasks (binary executables, Python functions and code, and CASA tasks) are described by YAML-format "cab definitions" detailing each task's "schema" (inputs and outputs). Stimela2 provides a rich syntax for chaining tasks together, and encourages a high degree of modularity: recipes may be nested into other recipes, and configuration is cleanly separated from recipe logic. Tasks can be executed natively or in isolated environments using containerization technologies such as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management
