Development of a High Throughput Cloud-Based Data Pipeline for 21 cm Cosmology
Ruby Byrne, Daniel Jacobs

TL;DR
This paper demonstrates a scalable, cost-effective cloud-based data pipeline for 21 cm cosmology, leveraging AWS to handle large astronomical datasets efficiently and exploring tradeoffs in cloud resource management.
Contribution
It introduces a cloud-based workflow for processing large cosmological data sets, highlighting cost-efficiency and scalability using AWS cloud services.
Findings
AWS cloud platform enables efficient data processing for cosmology.
Spot market trading reduces costs but affects processing times.
Monte Carlo simulation helps evaluate cloud resource tradeoffs.
Abstract
We present a case study of a cloud-based computational workflow for processing large astronomical data sets from the Murchison Widefield Array (MWA) cosmology experiment. Cloud computing is well-suited to large-scale, episodic computation because it offers extreme scalability in a pay-for-use model. This facilitates fast turnaround times for testing computationally expensive analysis techniques. We describe how we have used the Amazon Web Services (AWS) cloud platform to efficiently and economically test and implement our data analysis pipeline. We discuss the challenges of working with the AWS spot market, which reduces costs at the expense of longer processing turnaround times, and we explore this tradeoff with a Monte Carlo simulation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
