Development of a High Throughput Cloud-Based Data Pipeline for 21 cm   Cosmology

Ruby Byrne; Daniel Jacobs

arXiv:2009.10223·astro-ph.IM·March 4, 2021·Astron. Comput.

Development of a High Throughput Cloud-Based Data Pipeline for 21 cm Cosmology

Ruby Byrne, Daniel Jacobs

PDF

TL;DR

This paper demonstrates a scalable, cost-effective cloud-based data pipeline for 21 cm cosmology, leveraging AWS to handle large astronomical datasets efficiently and exploring tradeoffs in cloud resource management.

Contribution

It introduces a cloud-based workflow for processing large cosmological data sets, highlighting cost-efficiency and scalability using AWS cloud services.

Findings

01

AWS cloud platform enables efficient data processing for cosmology.

02

Spot market trading reduces costs but affects processing times.

03

Monte Carlo simulation helps evaluate cloud resource tradeoffs.

Abstract

We present a case study of a cloud-based computational workflow for processing large astronomical data sets from the Murchison Widefield Array (MWA) cosmology experiment. Cloud computing is well-suited to large-scale, episodic computation because it offers extreme scalability in a pay-for-use model. This facilitates fast turnaround times for testing computationally expensive analysis techniques. We describe how we have used the Amazon Web Services (AWS) cloud platform to efficiently and economically test and implement our data analysis pipeline. We discuss the challenges of working with the AWS spot market, which reduces costs at the expense of longer processing turnaround times, and we explore this tradeoff with a Monte Carlo simulation.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.