Data management and execution systems for the Rubin Observatory Science Pipelines
Nate B. Lust (1), Tim Jenness (2), James F. Bosch (1), Andrei Salnikov, (3), Nathan M. Pease (3), Michelle Gower (4), Mikolaj Kowalik (4), Gregory P., Dubois-Felsmann (5), Fritz Mueller (3), Pim Schellart (1) ((1) Princeton, University

TL;DR
The paper describes the Rubin Observatory's data management and execution system, including the Butler storage layer, metadata registry, and scalable pipeline execution infrastructure, enabling efficient large-scale data processing.
Contribution
Introduction of a comprehensive data management and pipeline execution system that abstracts complexities and scales from laptops to data centers.
Findings
Effective data abstraction layer simplifies algorithm development.
Scalable infrastructure supports diverse computing environments.
Automated pipeline creation enhances processing efficiency.
Abstract
We present the Rubin Observatory system for data storage/retrieval and pipelined code execution. The layer for data storage and retrieval is named the Butler. It consists of a relational database, known as the registry, to keep track of metadata and relations, and a system to manage where the data is located, named the datastore. Together these systems create an abstraction layer that science algorithms can be written against. This abstraction layer manages the complexities of the large data volumes expected and allows algorithms to be written independently, yet be tied together automatically into a coherent processing pipeline. This system consists of tools which execute these pipelines by transforming them into execution graphs which contain concrete data stored in the Butler. The pipeline infrastructure is designed to be scalable in nature, allowing execution on environments ranging…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed and Parallel Computing Systems · Scientific Computing and Data Management · Advanced Data Storage Technologies
