The Vera C. Rubin Observatory Data Butler and Pipeline Execution System
Tim Jenness, James F. Bosch, Nate B. Lust, Nathan M. Pease, and Michelle Gower, Mikolaj Kowalik, Gregory P. Dubois-Felsmann and, Fritz Mueller, Pim Schellart

TL;DR
The Rubin Observatory Data Butler and Pipeline Execution System provide an abstracted, scalable framework for constructing and executing science pipelines, facilitating flexible data management and processing during observatory operations.
Contribution
This paper introduces the Data Butler and pipeline system, enabling scalable, flexible data processing and management for the Rubin Observatory's scientific workflows.
Findings
System is in daily use during Rubin construction
Supports execution on object stores and local systems
Enhances pipeline flexibility and scalability
Abstract
The Rubin Observatory's Data Butler is designed to allow data file location and file formats to be abstracted away from the people writing the science pipeline algorithms. The Butler works in conjunction with the workflow graph builder to allow pipelines to be constructed from the algorithmic tasks. These pipelines can be executed at scale using object stores and multi-node clusters, or on a laptop using a local file system. The Butler and pipeline system are now in daily use during Rubin construction and early operations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Environmental Monitoring and Data Management · Distributed and Parallel Computing Systems
