The Vera C. Rubin Observatory Data Butler and Pipeline Execution System

Tim Jenness; James F. Bosch; Nate B. Lust; Nathan M. Pease; and Michelle Gower; Mikolaj Kowalik; Gregory P. Dubois-Felsmann and; Fritz Mueller; Pim Schellart

arXiv:2206.14941·astro-ph.IM·July 1, 2022·1 cites

The Vera C. Rubin Observatory Data Butler and Pipeline Execution System

Tim Jenness, James F. Bosch, Nate B. Lust, Nathan M. Pease, and Michelle Gower, Mikolaj Kowalik, Gregory P. Dubois-Felsmann and, Fritz Mueller, Pim Schellart

PDF

Open Access 3 Repos

TL;DR

The Rubin Observatory Data Butler and Pipeline Execution System provide an abstracted, scalable framework for constructing and executing science pipelines, facilitating flexible data management and processing during observatory operations.

Contribution

This paper introduces the Data Butler and pipeline system, enabling scalable, flexible data processing and management for the Rubin Observatory's scientific workflows.

Findings

01

System is in daily use during Rubin construction

02

Supports execution on object stores and local systems

03

Enhances pipeline flexibility and scalability

Abstract

The Rubin Observatory's Data Butler is designed to allow data file location and file formats to be abstracted away from the people writing the science pipeline algorithms. The Butler works in conjunction with the workflow graph builder to allow pipelines to be constructed from the algorithmic tasks. These pipelines can be executed at scale using object stores and multi-node clusters, or on a laptop using a local file system. The Butler and pipeline system are now in daily use during Rubin construction and early operations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Environmental Monitoring and Data Management · Distributed and Parallel Computing Systems